writing a caching version of Mechanize - ruby-on-rails-3

I'd like a caching version of Mechanize. The idea is that #get(uri...) checks to see if that uri has been previously fetched, and if so, fetch the response from the cache rather than hitting the web. If not in the cache, it hits the web and saves the response in the cache.
My naive approach doesn't work. (I probably don't need to mention that CachedWebPage is a subclass of ActiveRecord::Base):
class CachingMechanize < Mechanize
def get(uri, parameters = [], referer = nil, headers = {})
page = if (record = CachedWebPage.find_by_uri(uri.to_s))
record.contents
else
super.tap {|contents| CachedWebPage.create!(:uri => uri, :contents => contents)}
end
yield page if block_given?
page
end
end
This fails because the object returned by Mechanize#get() is a complex, circular structure that neither YAML nor JSON want to serialize for storage into the database.
I realize that what I want is to capture the low-level contents before Mechanize parses it.
Is there clean way to do this? I think I can use Mechanize's post_connect hook to access the raw page coming in, but I don't see how to subsequently pass the cached raw page to Mechanize for parsing.
Is there some package I should be using that does web page caching already?

It turns out the solution was simple, albeit not entirely clean. It's a simple matter to cache the results of Mechanize#get() like this:
class CachingMechanize < Mechanize
def get(uri, parameters = [], referer = nil, headers = {})
WebCache.with_web_cache(uri.to_s) { super }
end
end
... where with_web_cache() uses YAML to serialize and cache the object returned by super.
My problem was that by default, Mechanize#get() returns a Mechanize::Page object containing some lambda object, which cannot be dumped and loaded by YAML. The fix was to eliminate those lambdas, which turned out to be rather simple. Full code follows.
class CachingMechanize < Mechanize
def initialize(*args)
super
sanitize_scheme_handlers
end
def get(uri, parameters = [], referer = nil, headers = {})
WebCache.with_web_cache(uri.to_s) { super }
end
# private
def sanitize_scheme_handlers
scheme_handlers['http'] = SchemeHandler.new
scheme_handlers['https'] = scheme_handlers['http']
scheme_handlers['relative'] = scheme_handlers['http']
scheme_handlers['file'] = scheme_handlers['http']
end
class SchemeHandler
def call(link, page) ; link ; end
end
end
the moral: don't try to YAML.dump and YAML.load objects containing lambda or proc
This goes beyond just this example: if you see a YAML error that reads:
TypeError: allocator undefined for Proc
Check to see if there's a lambda or proc in the object you're trying to serialize and deserialize. If you are able (as I was in this case) to replace the lambda with a method call to an object, you should be able to work around the problem.
Hope this helps someone else.
Update
In response to #Martin's request for the definition of WebCache, here 'tis:
# Simple model for caching pages fetched from the web. Assumes
# a schema like this:
#
# create_table "web_caches", :force => true do |t|
# t.text "key"
# t.text "value"
# t.datetime "expires_at"
# t.datetime "created_at", :null => false
# t.datetime "updated_at", :null => false
# end
# add_index "web_caches", ["key"], :name => "index_web_caches_on_key", :unique => true
#
class WebCache < ActiveRecord::Base
serialize :value
# WebCache.with_web_cache(key) {
# ...body...
# }
#
# Searches the web_caches table for an entry with a matching key. If
# found, and if the entry has not expired, the value for that entry is
# returned. If not found, or if the entry has expired, yield to the
# body and cache the yielded value before returning it.
#
# Options:
# :expires_at sets the expiration date for this entry upon creation.
# Defaults to one year from now.
# :expired_prior_to overrides the value of 'now' when checking for
# expired entries. Mostly useful for unit testing.
#
def self.with_web_cache(key, opts = {})
serialized_key = YAML.dump(key)
expires_at = opts[:expires_at] || 1.year.from_now
expired_prior_to = opts[:expired_prior_to] || Time.zone.now
if (r = self.where(:key => serialized_key).where("expires_at > ?", expired_prior_to)).exists?
# cache hit
r.first.value
else
# cache miss
yield.tap {|value| self.create!(:key => serialized_key, :value => value, :expires_at => expires_at)}
end
end
# Prune expired entries. Typically called by a cron job.
def self.delete_expired_entries(expired_prior_to = Time.zone.now)
self.where("expires_at < ?", expired_prior_to).destroy_all
end
end

Related

Paperclip and Phusion Passenger NoHandlerError

I followed this guide to get drag and drop file uploads through AJAX: http://dannemanne.com/posts/drag-n-drop_upload_that_works_with_ror_and_paperclip
Everything was working fine on my development environment with WebBrick but if I deploy to PhusionPassenger then I get:
Paperclip::AdapterRegistry::NoHandlerError (No handler found for #<PhusionPassenger::Utils::RewindableInput:0x000000041aef38 #io=#<PhusionPassen...
I'm using this in my controller:
before_filter :parse_raw_upload, :only => :bulk_submissions
def bulk_submissions
...
#submission = Submission.create!(url: "", file: #raw_file, description: "Please edit this description", work_type: "other", date_completed: DateTime.now.to_date)
...
end
private
def parse_raw_upload
if env['HTTP_X_FILE_UPLOAD'] == 'true'
#raw_file = env['rack.input']
#raw_file.class.class_eval { attr_accessor :original_filename, :content_type }
#raw_file.original_filename = env['HTTP_X_FILE_NAME']
#raw_file.content_type = env['HTTP_X_MIME_TYPE']
end
end
Looking at the request itself all the headers are set (X_MIME_TYPE, X_FILE_NAME) etc.
Any ideas?
Thanks in advance!
The example you're cribbing from expects the file stream to be a StringIO object, but Passenger is giving you a PhusionPassenger::Utils::RewindableInput object instead.
Fortunately, a RewindableInput is duckalike to StringIO for this case, so Paperclip's StringioAdapter can be used to wrap your upload stream.
Inside the if block in your parse_raw_upload, at the end, do:
if #raw_file.class.name == 'PhusionPassenger::Utils::RewindableInput'
#raw_file = Paperclip::StringioAdapter.new(#raw_file)
end

How to determine ActiveModel::Errors validation type

With the migration from Rails 2 to Rails 3 validation errors were moved from ActiveRecord::Error to ActiveModel::Errors.
In rails 2 the validation error had a type and a message (among other things) and you could check the type of the validation error by doing something like the following:
rescue ActiveRecord::RecordInvalid => e
e.record.errors.each do |attr, error|
if error.type == :foo
do_something
end
end
end
But with Rails 3 it seems everything but the invalid attribute and message has been lost. As a result the only way to determine the type is to compare the error message:
rescue ActiveRecord::RecordInvalid => e
e.record.errors.each do |attr, error|
if error == "foobar"
do_something
end
end
end
Which is not at all ideal (eg. what if you have several validations which use the same message?).
Question:
Is there a better way in rails 3.0 to determine the type of validation error?
Check for added? on ActiveModel::Errors:
https://github.com/rails/rails/blob/master/activemodel/lib/active_model/errors.rb#L331
That allows you to do this:
record.errors.added?(:field, :error)
I needed it not only for test purposes, but also for API. I've ended up with monkey patch:
module CoreExt
module ActiveModel
module Errors
# When validation on model fails, ActiveModel sets only human readable
# messages. This does not allow programmatically identify which
# validation rule exactly was violated.
#
# This module patches {ActiveModel::Errors} to have +details+ property,
# that keeps name of violated validators.
#
# #example
# customer.valid? # => false
# customer.errors.messages # => { email: ["must be present"] }
# customer.errors.details # => { email: { blank: ["must be present"] } }
module Details
extend ActiveSupport::Concern
included do
if instance_methods.include?(:details)
fail("Can't monkey patch. ActiveModel::Errors already has method #details")
end
def details
#__details ||= Hash.new do |attr_hash, attr_key|
attr_hash[attr_key] = Hash.new { |h, k| h[k] = [] }
end
end
def add_with_details(attribute, message = nil, options = {})
error_type = message.is_a?(Symbol) ? message : :invalid
normalized_message = normalize_message(attribute, message, options)
details[attribute][error_type] << normalized_message
add_without_details(attribute, message, options)
end
alias_method_chain :add, :details
def clear_with_details
details.clear
clear_without_details
end
alias_method_chain :clear, :details
end
end
end
end
end
# Apply monkey patches
::ActiveModel::Errors.send(:include, ::CoreExt::ActiveModel::Errors::Details)

How to use same cached page for different urls in rails?

I have two urls that basically renders the same page. The minor differences can be easily executed via javascript, based on the location.href. Anyway, even when the routes point to the same controller#action, the second route is not using the page cached by the former. How can I achieve this?
I have a interesting requirement in my website opposite to you -- Different pages can be returned from a same url because of different themes. So I came up a solution called "anonymous cache", and I make my own cache key including the extra parameters. But I think this solution can give you some clues.
module AnonymousCache
def self.included(base)
base.extend(ClassMethods)
end
module ClassMethods
def caches_page_for_anonymous(*pages)
before_filter :check_cache_for_anonymous, :only => pages
after_filter :cache_for_anonymous, :only => pages
end
end
def check_cache_for_anonymous
return unless perform_caching
return if logged_in?
path = anon_cache_path
if content = Rails.cache.read(path)
send_data(content,
:type => 'text/html;charset=utf-8', :disposition => 'inline')
return false
end
end
def cache_for_anonymous
return unless perform_caching
return if logged_in?
path = anon_cache_path
#expires_in ||= 1.hour
self.class.benchmark "Cached page for guest: #{path}" do
Rails.cache.write(path, response.body, :expires_in => #expires_in.to_i)
end
end
protected :check_cache_for_anonymous
protected :cache_for_anonymous
private
def anon_cache_path()
path1 = File.join(request.host, current_theme, request.path)
q = request.query_string
path1 = "#{path1}?#{q}" unless q.empty?
path1
end
end
anon_cache_path method is where I make canonical key for the page cache. You can see I includes current_theme in it.
You can copy this and changes anon_cache_path according to your requirements.

Mongoid dynamic finder with Mongoid::Errors::DocumentNotFound exception raised

I'm building a REST api for this project that uses Mongoid.
I've setup the following to catch the Mongoid::Errors::DocumentNotFound exception:
rescue_from Mongoid::Errors::DocumentNotFound in my base controller
In my controller I've this query code:
#current_account.users.find(:first, :conditions => {:name => "some_name"})
The above query just returns nil. It doesn't raise the exception.
Tried with another syntax as well:
User.find(:conditions => {:name => "same"}).first
All those methods just runs where internally and afaik where doesn't raise exception, its simply returns []
So what can be the solution to this? I want partially dynamic finder but should raise the exception too?
I've met same problem today, and found another solution.
Set raise_not_found_error to false. so your config/mongoid.yml should be
development:
host: localhost
port: 10045
username: ...
password: ...
database: ...
raise_not_found_error: false
from http://mongoid.org/docs/installation/configuration.html
I believe that Mongoid will only raise a DocumentNotFound exception when using the find method by passing in an object's id (and not with conditions). Otherwise it will return nil. From the Mongoid source:
# lib/mongoid/errors/document_not_found.rb
# Raised when querying the database for a document by a specific id which
# does not exist. If multiple ids were passed then it will display all of
# those.
You will have to check manually to see if you got any results and either raise the DocumentNotFound exception yourself (not great), or raise your own custom exception (better solution).
An example of the former would be something like this:
raise Mongoid::Errors::DocumentNotFound.new(User, params[:name]) unless #current_account.users.first(:conditions => {:name => params[:name]})
Update: I haven't tested any of this, but it should allow you to make calls like (or at least point you in the right direction - i hope!):
#current_account.users.where!(:conditions => {:name => params[:name]})
Which will throw a custom Mongoid::CollectionEmpty error, if the collection returned from the query is empty. Note that it's not the most efficient solution, since in order to find out if the returned collection is empty - it has to actually process the query.
Then all you need to do is rescue from Mongoid::CollectionEmpty instead (or as well).
# lib/mongoid_criterion_with_errors.rb
module Mongoid
module Criterion
module WithErrors
extend ActiveSupport::Concern
module ClassMethods
def where!(*args)
criteria = self.where(args)
raise Mongoid::EmptyCollection(criteria) if criteria.empty?
criteria
end
end
end
end
class EmptyCollection < StandardError
def initialize(criteria)
#class_name = criteria.class
#selector = criteria.selector
end
def to_s
"Empty collection found for #{#class_name}, using selector: #{#selector}"
end
end
end
# config/application.rb
module ApplicationName
class Application < Rails::Application
require 'mongoid_criterion_with_errors'
#...snip...
end
end
# app/models/user.rb
class User
include Mongoid::Document
include Mongoid::Timestamps
include Mongoid::Criterion::WithErrors
#...snip...
end

How do I write a Rails 3.1 engine controller test in rspec?

I have written a Rails 3.1 engine with the namespace Posts. Hence, my controllers are found in app/controllers/posts/, my models in app/models/posts, etc. I can test the models just fine. The spec for one model looks like...
module Posts
describe Post do
describe 'Associations' do
it ...
end
... and everything works fine.
However, the specs for the controllers do not work. The Rails engine is mounted at /posts, yet the controller is Posts::PostController. Thus, the tests look for the controller route to be posts/posts.
describe "GET index" do
it "assigns all posts as #posts" do
Posts::Post.stub(:all) { [mock_post] }
get :index
assigns(:posts).should eq([mock_post])
end
end
which yields...
1) Posts::PostsController GET index assigns all posts as #posts
Failure/Error: get :index
ActionController::RoutingError:
No route matches {:controller=>"posts/posts"}
# ./spec/controllers/posts/posts_controller_spec.rb:16
I've tried all sorts of tricks in the test app's routes file... :namespace, etc, to no avail.
How do I make this work? It seems like it won't, since the engine puts the controller at /posts, yet the namespacing puts the controller at /posts/posts for the purpose of testing.
I'm assuming you're testing your engine with a dummy rails app, like the one that would be generated by enginex.
Your engine should be mounted in the dummy app:
In spec/dummy/config/routes.rb:
Dummy::Application.routes.draw do
mount Posts::Engine => '/posts-prefix'
end
My second assumption is that your engine is isolated:
In lib/posts.rb:
module Posts
class Engine < Rails::Engine
isolate_namespace Posts
end
end
I don't know if these two assumptions are really required, but that is how my own engine is structured.
The workaround is quite simple, instead of this
get :show, :id => 1
use this
get :show, {:id => 1, :use_route => :posts}
The :posts symbol should be the name of your engine and NOT the path where it is mounted.
This works because the get method parameters are passed straight to ActionDispatch::Routing::RouteSet::Generator#initialize (defined here), which in turn uses #named_route to get the correct route from Rack::Mount::RouteSet#generate (see here and here).
Plunging into the rails internals is fun, but quite time consuming, I would not do this every day ;-) .
HTH
I worked around this issue by overriding the get, post, put, and delete methods that are provided, making it so they always pass use_route as a parameter.
I used Benoit's answer as a basis for this. Thanks buddy!
module ControllerHacks
def get(action, parameters = nil, session = nil, flash = nil)
process_action(action, parameters, session, flash, "GET")
end
# Executes a request simulating POST HTTP method and set/volley the response
def post(action, parameters = nil, session = nil, flash = nil)
process_action(action, parameters, session, flash, "POST")
end
# Executes a request simulating PUT HTTP method and set/volley the response
def put(action, parameters = nil, session = nil, flash = nil)
process_action(action, parameters, session, flash, "PUT")
end
# Executes a request simulating DELETE HTTP method and set/volley the response
def delete(action, parameters = nil, session = nil, flash = nil)
process_action(action, parameters, session, flash, "DELETE")
end
private
def process_action(action, parameters = nil, session = nil, flash = nil, method = "GET")
parameters ||= {}
process(action, parameters.merge!(:use_route => :my_engine), session, flash, method)
end
end
RSpec.configure do |c|
c.include ControllerHacks, :type => :controller
end
Use the rspec-rails routes directive:
describe MyEngine::WidgetsController do
routes { MyEngine::Engine.routes }
# Specs can use the engine's routes & named URL helpers
# without any other special code.
end
– RSpec Rails 2.14 official docs.
Based on this answer I chose the following solution:
#spec/spec_helper.rb
RSpec.configure do |config|
# other code
config.before(:each) { #routes = UserManager::Engine.routes }
end
The additional benefit is, that you don't need to have the before(:each) block in every controller-spec.
Solution for a problem when you don't have or cannot use isolate_namespace:
module Posts
class Engine < Rails::Engine
end
end
In controller specs, to fix routes:
get :show, {:id => 1, :use_route => :posts_engine}
Rails adds _engine to your app routes if you don't use isolate_namespace.
I'm developing a gem for my company that provides an API for the applications we're running. We're using Rails 3.0.9 still, with latest Rspec-Rails (2.10.1). I was having a similar issue where I had defined routes like so in my Rails engine gem.
match '/companyname/api_name' => 'CompanyName/ApiName/ControllerName#apimethod'
I was getting an error like
ActionController::RoutingError:
No route matches {:controller=>"company_name/api_name/controller_name", :action=>"apimethod"}
It turns out I just needed to redefine my route in underscore case so that RSpec could match it.
match '/companyname/api_name' => 'company_name/api_name/controller_name#apimethod'
I guess Rspec controller tests use a reverse lookup based on underscore case, whereas Rails will setup and interpret the route if you define it in camelcase or underscore case.
It was already mentioned about adding routes { MyEngine::Engine.routes }, although it's possible to specify this for all controller tests:
# spec/support/test_helpers/controller_routes.rb
module TestHelpers
module ControllerRoutes
extend ActiveSupport::Concern
included do
routes { MyEngine::Engine.routes }
end
end
end
and use in rails_helper.rb:
RSpec.configure do |config|
config.include TestHelpers::ControllerRoutes, type: :controller
end