Sphinx and friend search - ruby-on-rails-3

I have a high-level question about Sphinx on Ruby on Rails:
Say I want to search are the articles written by my friends.
The model structure would be:
User has_many users :through relationships
Article belongs_to User
My question is: what sort of syntax would I use on Sphinx so the user searches for articles and gets only the articles written by his friends? I am having trouble finding this online and I'd like to have a handle on how this would work before I implement my solution.
NOTE: I suspect one solution is to have an array of friend IDs and then use a :condition :with => {:id => array_of_friendIDs}. But maybe there is a more effective way of doing it?

You're basically correct - you're going to need to assemble an array of friend ids and pass it to the search, using the :with option. How you fetch that list isn't particularly related to Sphinx, though.
friend_ids = current_user.users.pluck(:id)
#articles = Article.search(params['search_term'], :with => {:user_id => friend_ids})
Using .pluck will save you a good bit of time that would otherwise be devoted to instantiating a bunch of User objects - you only need their ids. Make sure that you've set user_id as an attribute for Sphinx (using has user_id in the define_index block).

Related

How can I query a rails 3 app efficiently?

I have a search form that queries one table in the database but there are many parameters (language, level, creator etc). The code below works provided the fields in question are filled in but I want to change it to:
a) add more parameters (there are several);
b) allow for a field to be empty
Here's the code in the controller:
#materials = Material.find(:all, :conditions => {:targ_lang => params["targ_lang"],
:inst_lang => params["inst_lang"],
:level => params["level"]})
Totally new to this I'm afraid but a lot of the documentation suggests I should be using "where".
Since Rails 3 you can use the where() function:
#materials = Material.where(targ_lang: params["targ_lang"], inst_lang: params["inst_lang"], level: params["level"])
Also, you could take a look at scopes
These allow you to set what you want to do in the model and call it in the controller for example:
class Material < ActiveRecord::Base
scope :active, where(active_state: true)
end
Then in the controller you do something like:
#active_materials = Material.active
This can be useful if you are joining several models and want to keep your controllers less messy.
To conclude, like #RVG said, seachlogic is quite useful as well as, there are others like Sphinx and Elastic Search. You should take a quick look at these and use the one you feel most confortable with.
If you are using search functionality in your app I suggest using SearchLogic gem
It is easy to use and effective..
SearchLogic
RailsCasts for searchlogic

An efficient way to track user login dates and IPs history

I'm trying to track user login history for stat purposes but its not clear to me what the best way to go about it would be. I could have a separate table that records users and their login stats with a date, but that table could get REALLY big. I could track some historic fields in the User model/object itself in a parse-able field and just update it (them) with some delimited string format. e.g. split on :, get the last one, if an included date code isn't today, add an item (date+count) otherwise increment, then save it back. At least with this second approach it would be easy to remove old items (e.g. only keep 30 days of daily logins, or IPs), as a separate table would require a task to delete old records.
I'm a big fan of instant changes. Tasks are useful, but can complicate things for maintenance reasons.
Anyone have any suggestions? I don't have an external data caching solution up or anything yet. Any pointers are also welcome! (I've been hunting for similar questions and answers)
Thanks!
If you have the :trackable module, I found this the easiest way. In the User model (or whichever model you're authenticating)
def update_tracked_fields!(request)
old_signin = self.last_sign_in_at
super
if self.last_sign_in_at != old_signin
Audit.create :user => self, :action => "login", :ip => self.last_sign_in_ip
end
end
(Inspired by https://github.com/plataformatec/devise/wiki/How-To:-Turn-off-trackable-for-admin-users)
There is a nice way to do that through Devise.
Warden sets up a hook called after_set_user that runs after setting a user. So, supposed you have a model Login containing an ip field, a logged_in_at field and user_id field, you can only create the record using this fields.
Warden::Manager.after_set_user :except => :fetch do |record, warden, options|
Login.create!(:ip => warden.request.ip, :logged_in_at => Time.now, :user_id => record.id)
end
Building upon #user208769's answer, the core Devise::Models::Trackable#update_tracked_fields! method now calls a helper method named update_tracked_fields prior to saving. That means you can use ActiveRecord::Dirty helpers to make it a little simpler:
def update_tracked_fields(request)
super
if last_sign_in_at_changed?
Audit.create(user: self, action: 'login', ip: last_sign_in_ip)
end
end
This can be simplified even further (and be more reliable given validations) if audits is a relationship on your model:
def update_tracked_fields(request)
super
audits.build(action: 'login', ip: last_sign_in_ip) if last_sign_in_at_changed?
end
Devise supports tracking the last signed in date and the last signed in ip address with it's :trackable module. By adding this module to your user model, and then also adding the correct fields to your database, which are:
:sign_in_count, :type => Integer, :default => 0
:current_sign_in_at, :type => Time
:last_sign_in_at, :type => Time
:current_sign_in_ip, :type => String
:last_sign_in_ip, :type => String
You could then override the Devise::SessionsController and it's create action to then save the :last_sign_in_at and :last_sign_in_ip to a separate table in a before_create callback. You should then be able to keep them as long you would like.
Here's an example (scribd_analytics)
create_table 'page_views' do |t|
t.column 'user_id', :integer
t.column 'request_url', :string, :limit => 200
t.column 'session', :string, :limit => 32
t.column 'ip_address', :string, :limit => 16
t.column 'referer', :string, :limit => 200
t.column 'user_agent', :string, :limit => 200
t.column 'created_at', :timestamp
end
Add a whole bunch of indexes, depending on queries
Create a PageView on every request
We used a hand-built SQL query to take out the ActiveRecord overhead on
this
Might try MySQL's 'insert delayed´
Analytics queries are usually hand-coded SQL
Use 'explain select´ to make sure MySQL isusing the indexes you expect
Scales pretty well
BUT analytics queries expensive, can clog upmain DB server
Our solution:
use two DB servers in a master/slave setup
move all the analytics queries to the slave
http://www.scribd.com/doc/49575/Scaling-Rails-Presentation-From-Scribd-Launch
Another option to check is Gattica with Google Analytics
I hate answering my own questions, especially given that you both gave helpful answers. I think answering my question with the approach I took might help others, in combination with your answers.
I've been playing with the Impressionist Gem (the only useful page view Gem since the abandoned RailStat) with good results so far. After setting up the basic migration, I found that the expected usage follows Rail's MVC design very closely. If you add "impressionist" to a Controller, it will go looking for the Model when logging the page view to the database. You can modify this behaviour or just call impressionist yourself in your Controller (or anywhere really) if you're like me and happen to be testing it out on a Controller that doesn't have a Model.
Anyways, I got it working with Devise to track successful logins by overriding the Devise::SessionsController and just calling the impressionist method for the #current_member: (don't forget to check if it's nil! on failed login)
class TestSessionController < Devise::SessionsController
def create
if not #current_member.nil?
impressionist(#current_member)
end
super
end
end
Adding it to other site parts later for some limited analytics is easy to do. The only other thing I had to do was update my routes to use the new TestSessionController for the Devise login route:
post 'login' => 'test_session#create', :as => :member_session
Devise works like normal without having to modify Devise in anyway, and my impressionist DB table is indexed and logging logins. I'll just need a rake task later to trim it weekly or so.
Now I just need to work out how to chart daily logins without having to write a bunch of looping, dirty queries...
There is also 'paper_trail' gem, that allows to track model changes.

How to retrieve both "associated" records and "associated through" records in a performant way?

I am using Ruby on Rails 3.1 and I am trying to improve an SQL query in order to retrieve both "associated" records and "associated through" records (ActiveRecord::Associations) in a performant way so to avoid the "N + 1 query problem". That is, I have:
class Article < ActiveRecord::Base
has_many :category_relationships
has_many :categories,
:through => :category_relationships
end
class Category < ActiveRecord::Base
has_many :article_relationships
has_many :articles,
:through => :article_relationships
end
In a couple of SQL queries (that is, in a "performant way", maybe by using the Ruby on Rails includes() method) I would like to retrieve both categories and category_relationships, or both articles and article_relationships.
How can I make that?
P.S.: I am improve queries like the followings:
#category = Category.first
articles = #category.articles.where(:user_id => #current_user.id)
articles.each do |article|
# Note: In this example the 'inspect' method is just a method to "trigger" the
# "eager loading" functionalities
article.category_relationships.inspect
end
You can do
Article.includes(:category_relationships => :categories).find(1)
Which will reduce this to 3 queries (1 for each table). For performance, also make sure your foreign keys have an index.
But in general, I'm curious why the "category_relationships" entity exists at all, and why this isn't a has_and_belongs_to sort of situation?
Updated
As per your changed question, you can still do
Category.includes(:article_relationships => :articles).first
If you watch the console (or tail log/development) you'll see that when you call the associations, it'll hit the cached values and you're golden.
But I am still curious why you're not using a Has and Belongs To Many association.

Cancan Thinking Sphinx current_ability Questions

trying to get cancan working with thinking sphinx but running into some issues.
Before using sphinx, I had this in my companies view:
#companies = Company.accessible_by(current_ability)
That prevented my users from seeing anyone else's companies...
After installing sphinx, I ended up with:
#companies = Company.accessible_by(current_ability).search(params[:search], :include => :order, :match_mode => :extended ).paginate(:page => params[:page])
Which now displays all my companies and isn't refining per user based on ability.
It would see ts isn't set up for cancan?
I think it's more that accessible_by is probably a scope - which is Database/SQL-driven. Sphinx has its own query interface, and so ActiveRecord scopes don't apply.
An inefficient workaround (gets all companies first):
company_ids = Company.accessible_by(current_ability).collect &:id
#companies = Company.search params[:search],
:include => :order,
:match_mode => :extended,
:page => params[:page],
:with => {:sphinx_internal_id => company_ids}
A couple of things to note: sphinx_internal_id is the indexed model's primary key - Sphinx has its own unique identifier named id, hence the distinction. Also: You don't want to call paginate on a search collection - Sphinx always paginates, so just pass the :page param through to the search call.
There'd be two better workarounds that I can think of - either have a Sphinx equivalent of accessible_by, with the relevant information added to your indices as attributes - or, simpler if not quite as ideal, just get the company ids returned in the first line of my above snippet without loading up every company as an ActiveRecord object. Both will probably mean bypassing and/or duplicating Cancan's helpers.
Although... maybe this would do the trick, taking the latter approach:
sql = Company.accessible_by(current_ability).select(:id).to_sql
company_ids = Company.connection.select_values sql
#companies = Company.search params[:search],
:include => :order,
:match_mode => :extended,
:page => params[:page],
:with => {:sphinx_internal_id => company_ids}
Avoids loading unnecessary Company objects, uses the Cancan helper (provided it is/returns a scope), and works neatly with what Sphinx/Thinking Sphinx expects. I've not used Cancan though, so this is a bit of guesswork.

Constructing a has-and-belongs-to-many query

I have a rails app (running on version 2.2.2) that has a model called Product. Product is in a has-and-belongs-to-many relationship with Feature. The problem is that I need have search functionality for the products. So I need to be able to search for products that have a similar name, and some other attributes. The tricky part is that the search must also return products that have the exact set of features indicated in the search form (this is represented by a bunch of checkboxes). The following code works, but it strikes me as rather inefficient:
#products = Product.find(:all, :conditions=>["home=? AND name LIKE ? AND made_by LIKE ? AND supplier LIKE ? AND ins LIKE ?",hme,'%'+opts[0]+'%','%'+opts[1]+'%','%'+opts[3]+'%','%'+opts[4]+'%'])
#see if any of these products have the correct features
if !params[:feature_ids].nil?
f = params[:feature_ids].collect{|i| i.to_i}
#products.delete_if {|x| x.feature_ids!=f}
end
I'm sorry that my grasp of rails/sql is so weak, but does anyone have any suggestions about how to improve the above code? Thanks so much!
First, i would recommend you to manually write a FeatureProduct model (and not use the default 'has_and_belongs_to_many')
EG
class FeatureProduct
belongs_to :feature
belongs_to :product
end
class Product
has_many :feature_products
has_many :features, :through => :feature_products
end
class Feature
has_many :feature_products
has_many :products, :through => :feature_products
end
For the search: You may find the gem SearchLogic to be exactly what you need. It has support for 'LIKE' conditions (it means that you can write in a more 'Rails way' your query). It also has support for performing a search with conditions on a related model (on your Feature model, to be more precise).
The solution would be something like:
search = Product.search
search.name_like = opt[0]
search.made_by_like = opt[1]
...
search.feature_products_id_equals = your_feature_ids
..
#product_list = search.all
There is also an excellent screencast explaining the use of this gem.
Good luck :)