Rails 3 - cache web service call - ruby-on-rails-3

In my application, in the homepage action, I call a specific web service that returns JSON.
parsed = JSON.parse(open("http://myservice").read)
#history = parsed['DATA']
This data will not change more than once per 60 seconds and does not change on a per-visitor basis, so i would like to, ideally, cache the #history variable itself (since the parsing will not result in a new result) and auto invalidate it if it is more than a minute old.
I'm unsure of the best way to do this. The default Rails caching methods all seem to be more oriented towards content that needs to be manually expired. I'm sure there is a quick and easy method to do this, I just don't know what it is!

You can use the built in Rails cache for this:
#history = Rails.cache.fetch('parsed_myservice_data', :expires_in => 1.minute) do
JSON.parse connector.get_response("http://myservice")
end
One problem with this approach is when the rebuilding of the data to be cached takes
quite a long time. If you get many client requests during this time, each of them will
get a cache miss and call your block, resulting in lots of duplicated effort, not to mention slow response times.
EDIT: In Rails 3.x you can pass the option :race_condition_ttl to the fetch method to avoid this problem. Read more about it here.
A good solution to this in previous versions of Rails is to setup a background/cron job to be run at regular intervals that will fetch and parse the data and update the cache.
In your controller or model:
#history = Rails.cache.fetch('parsed_myservice_data') do
JSON.parse connector.get_response("http://myservice")
end
In your background/cron job:
Rails.cache.write('parsed_myservice_data',
JSON.parse connector.get_response("http://myservice"))
This way, your client requests will always get fresh cached data (except for the first
request if the background/cron job hasn't been run yet.)

I don't know of an easy railsy way of doing this. You might want to look into using redis. Redis lets you set expiration times on the data you store in it. Depending on which redis gem you use it'd look something like this:
#history = $redis.get('history')
if not #history
#history = JSON.parse(open("http://myservice").read)['DATA']
$redis.set('history', #history)
$redis.expire('history', 60)
end
Because there's only one redis service this will work for all your rails processes.

We had a similar requirement and we ended up using Squid as a forward proxy for all the webservice calls from the rails server. Squid was configured to have a cache-expiry time of 60 seconds.
http_connection_factory.rb:
class HttpConnectionFactory
def self.connection
AppConfig.use_forward_proxy ? Net::HTTP::Proxy(AppConfig.forward_proxy_host, AppConfig.forward_proxy_port) : Net::HTTP
end
end
In your application's home page action, you can use the proxy instead of making the call directly.
connector = HttpConnectionFactory.connection
parsed = JSON.parse(connector.get_response("http://myservice"))
#history = parsed['DATA']
We had second thoughts about using Redis or Memcache. But, we had several service calls and wanted to avoid all the hassles of generating keys and sweeping them at appropriate times.
So, in our case, the forward proxy took care of all those nitty gritties. Please refer to Squid Wiki for the configuration parameters necessary.

Related

Enable Impala Impersonation on Superset

Is there a way to make the logged user (on superset) to make the queries on impala?
I tried to enable the "Impersonate the logged on user" option on Databases but with no success because all the queries run on impala with superset user.
I'm trying to achieve the same! This will not completely answer this question since it does not still work but I want to share my research in order to maybe help another soul that is trying to use this instrument outside very basic use cases.
I went deep in the code and I found out that impersonation is not implemented for Impala. So you cannot achieve this from the UI. I found out this PR https://github.com/apache/superset/pull/4699 that for whatever reason was never merged into the codebase and tried to copy&paste code in my Superset version (1.1.0) but it didn't work. Adding some logs I can see that the configuration with the impersonation is updated, but then the actual Impala query is with the user I used to start the process.
As you can imagine, I am a complete noob at this. However I found out that the impersonation thing happens when you create a cursor and there is a constructor parameter in which you can pass the impersonation configuration.
I managed to correctly (at least to my understanding) implement impersonation for the SQL lab part.
In the sql_lab.py class you have to add in the execute_sql_statements method the following lines
with closing(engine.raw_connection()) as conn:
# closing the connection closes the cursor as well
cursor = conn.cursor(**database.cursor_kwargs)
where cursor_kwargs is defined in db_engine_specs/impala.py as the following
#classmethod
def get_configuration_for_impersonation(cls, uri, impersonate_user, username):
logger.info(
'Passing Impala execution_options.cursor_configuration for impersonation')
return {'execution_options': {
'cursor_configuration': {'impala.doas.user': username}}}
#classmethod
def get_cursor_configuration_for_impersonation(cls, uri, impersonate_user,
username):
logger.debug('Passing Impala cursor configuration for impersonation')
return {'configuration': {'impala.doas.user': username}}
Finally, in models/core.py you have to add the following bit in the get_sqla_engine def
params = extra.get("engine_params", {}) # that was already there just for you to find out the line
self.cursor_kwargs = self.db_engine_spec.get_cursor_configuration_for_impersonation(
str(url), self.impersonate_user, effective_username) # this is the line I added
...
params.update(self.get_encrypted_extra()) # already there
#new stuff
configuration = {}
configuration.update(
self.db_engine_spec.get_configuration_for_impersonation(
str(url),
self.impersonate_user,
effective_username))
if configuration:
params.update(configuration)
As you can see I just shamelessy pasted the code from the PR. However this kind of works only for the SQL lab as I already said. For the dashboards there is an entirely different way of querying Impala that I did not still find out.
This means that queries for the dashboards are handled in a different way and there isn't something like this
with closing(engine.raw_connection()) as conn:
# closing the connection closes the cursor as well
cursor = conn.cursor(**database.cursor_kwargs)
My gut (and debugging) feeling is that you need to first understand the sqlalchemy part and extend a new ImpalaEngine class that uses a custom cursor with the impersonation conf. Or something like that, however it is not simple (if we want to call this simple) as the sql_lab part. So, the trick is to find out where the query is executed and create a cursor with the impersonation configuration. Easy, isnt'it ?
I hope that this could shed some light to you and the others that have this issue. Let me know if you did find out another way to solve this issue, or if this comment was useful.
Update: something really useful
A colleague of mine succesfully implemented impersonation with impala without touching any superset related, but instead working directly with the impyla lib. A PR was open with the code to change. You can apply the patch directly in the impyla src used by superset. You have to edit both dbapi.py and hiveserver2.py.
As a reminder: we are still testing this and we do not know if it works with different accounts using the same superset instance.

Why does MembershipProvider.GetUser consume so many resources how to ensure it uses the EF db context?

In our web application, we observed the following:
GetUser/CreatMembershipEntities/ExplicitLoadFromAssembly seems quite expensive.
Also noticing that CreateEntityConnection is being called - EntityFramework?
I'm not entirely convinced that EF was configured correctly for this application. If it was, and was in use, I wouldn't expect to see new connections to be initiated for every call - yes/no?
Is a way to streamline this to avoid some major code refactoring?
The use of System.Web.Security.Membership.GetUser() seems like a biggie here. Instead of using the MembershipProvider to create users, what about just executing a sproc that does the same things.
I have found the following code that is ridiculous, as far as I am concerned - in causing a new call for each user until a unique one can be generated:
For i As Integer = 1 To Integer.MaxValue
'Generate unique username
If System.Web.Security.Membership.GetUser(userName & i) Is Nothing Then
'Increment value until no duplicate username found
userName = userName & i
Exit For
End If
Next
-- UPDATE--
I have modified the question slightly...
We were able to run up to 20 users then the IIS server would tank. Does the GetUser() method create a brand new connection every time? It looks like it, based on the results. How can I ensure that this GetUser() thing is actually using the db context, rather than spinning up its own connections?

Ruby mongodb: Three newly created objects doesn't appear to exist during testing

I'm using mongodb to store some data. Then I have a function that gets the object with the latest timestamp and one with the oldest. I haven't experienced any issues during development or production with this method but when I try to implement a test for it the test fails approx 20% of the times. I'm using rspec to test this method and I'm not using mongoid or mongomapper. I create three objects with different timestamps but get a nil response since my dataset contains 0 objects. I have read a lot of articles about write_concern and that it might be the problem with "unsafe writes" but I have tried almost all the different combinations with these parameters (w, fsync, j, wtimeout) without any success. Does anyone have any idea how to solve this issue? Perhaps I have focused too much with the write_concern track and that the problems lies somewhere else.
This is the method that fetches the latest and oldest timestamp.
def first_and_last_timestamp(customer_id, system_id)
last = collection(customer_id).
find({sid:system_id}).
sort(["t",Mongo::DESCENDING]).
limit(1).next()
first = collection(customer_id).
find({sid:system_id}).
sort(["t",Mongo::ASCENDING]).
limit(1).next()
{ min: first["t"], max: last["t"] }
end
Im inserting data using this method where data is a json object.
def insert(customer_id, data)
collection(customer_id).insert(data)
end
I have reverted back to use the default for setting up my connection
Mongo::MongoClient.new(mongo_host, mongo_port)
I'm using the gem mongo (1.10.2). I'm not using any fancy setup for my mongo database. I've just installed mongo using brew on my mac and started it. The version of my mongo database is v2.6.1.

Symfony2 performance tweaking

Symfony2 was looking so promising, powerful and flexible. So we were going to use Symfony2 + mongodb for one of our projects. But it appeared too slow (Apache/2.2.25 + PHP/5.4.20). Currently the app is pretty simple. but I have noticed that the httpd.exe lads CPU up to 28% when some simple page is loaded. The page is quite lite - just user profile info and the list of his posts. I even can't imagine how hundreds of users can be served (not even talking about numbers like 100k users) if performance will not be much better.
For instance the CPU load is 2% when opening the heavy 'products' page of ActivationCloud account (which fetches a good amount of data) (PHP+Smarty+SQL).
After taking a look on Xdebug output, I have found that a gret deal of time 20% is utilized by ClassLoader->loadClass(...) - 265 calls
After performing the following steps:
*generated class map
php composer.phar dump-autoload --optimize
*installed and enabled APC
[APC]
extension=php_apc.dll
apc.enabled=1
apc.shm_segments=1
;32M per WordPress install
apc.shm_size=128M
;Relative to the number of cached files (you may need to
watch your stats for a day or two to find out a good number)
apc.num_files_hint=7000
;Relative to the size of WordPress
apc.user_entries_hint=4096
;The number of seconds a cache entry is allowed to idle
in a slot before APC dumps the cache
apc.ttl=7200
apc.user_ttl=7200
apc.gc_ttl=3600
;Setting this to 0 will give you the best performance, as APC will
;not have to check the IO for changes. However, you must clear
;the APC cache to recompile already cached files. If you are still
;developing, updating your site daily in WP-ADMIN, and running W3TC
;set this to 1
apc.stat=1
;This MUST be 0, WP can have errors otherwise!
apc.include_once_override=0
;Only set to 1 while debugging
apc.enable_cli=0
;Allow 2 seconds after a file is created before
it is cached to prevent users from seeing half-written/weird pages
apc.file_update_protection=2
;Leave at 2M or lower. WordPress does't have any file sizes close to 2M
apc.max_file_size=2M
;Ignore files
apc.filters = "/var/www/apc.php"
apc.cache_by_default=1
apc.use_request_time=1
apc.slam_defense=0
apc.mmap_file_mask=/var/www/temp/apc.XXXXXX
apc.stat_ctime=0
apc.canonicalize=1
apc.write_lock=1
apc.report_autofilter=0
apc.rfc1867=0
apc.rfc1867_prefix =upload_
apc.rfc1867_name=APC_UPLOAD_PROGRESS
apc.rfc1867_freq=0
apc.rfc1867_ttl=3600
apc.lazy_classes=0
apc.lazy_functions=0
expected a miracle after it but it did not happen.
*enabled APC class loader - in Symfony\web\app.php uncommented
/*
$loader = new ApcClassLoader('sf2', $loader);
$loader->register(true);
*/
The ClassLoader->loadClass(...) got better 'Self' is 11 instead of 21
Frankly speaking I was shocked by what I saw in xdebug :( a lot of repetitive calls like Container->get(...) -317 calls, DocumentManager->getClassMeataData(...) - 301 calls. Totally more than 2k of function calls. Hard to believe that.
These bundles are installed:
class AppKernel extends Kernel
{
public function registerBundles()
{
$bundles = array(
new Symfony\Bundle\FrameworkBundle\FrameworkBundle(),
new Symfony\Bundle\SecurityBundle\SecurityBundle(),
new Symfony\Bundle\TwigBundle\TwigBundle(),
new Symfony\Bundle\MonologBundle\MonologBundle(),
new Symfony\Bundle\SwiftmailerBundle\SwiftmailerBundle(),
new Symfony\Bundle\AsseticBundle\AsseticBundle(),
new Doctrine\Bundle\DoctrineBundle\DoctrineBundle(),
new Doctrine\Bundle\MongoDBBundle\DoctrineMongoDBBundle(),
new Sensio\Bundle\FrameworkExtraBundle\SensioFrameworkExtraBundle(),
new HWI\Bundle\OAuthBundle\HWIOAuthBundle(),
new Knp\Bundle\MenuBundle\KnpMenuBundle(),
... our bundles ...
);
if (in_array($this->getEnvironment(), array('dev', 'test'))) {
$bundles[] = new Symfony\Bundle\WebProfilerBundle\WebProfilerBundle();
$bundles[] = new Sensio\Bundle\DistributionBundle\SensioDistributionBundle();
$bundles[] = new Sensio\Bundle\GeneratorBundle\SensioGeneratorBundle();
}
return $bundles;
}
It was sad to find that Symfony2 got one of the worst benchmark results among others php frameworks http://www.techempower.com/benchmarks/#section=data-r8&hw=i7&test=json&l=sg
At the same time Francois Zaninotto said in his blog http://symfony.com/blog/who-really-uses-symfony that Yahoo uses Symfony2 for the bookmarks service, tried some apps form the list http://trac.symfony-project.org/wiki/ApplicationsDevelopedWithSymfony - they are not looking slow also on Quora http://www.quora.com/Who-is-using-Symfony2-in-production its spoken that dailymotion is using it as well.
How to make the performance acceptable?
Got Symfony working x10 faster after adding the
realpath_cache_size = 4096k
to php.ini
First you should use linux (you mentioned https.exe so I think you are using windows). Than you should use nginx instead of apache and php-5.5 with fpm instead of mod_php. Opcache instead of apc (by the way apc.stat should be turned off). Doctrine caches should be turned on and than you should use http caching wherever you can. (You can view packagist's code for some hints.)

How to use multiple caches in rails? (for real)

I'd like to use 2 caches -- the in memory default one and a memcache one, though abstractly it shouldn't matter (I think) which two.
The in memory default one is where I want to load small and rarely changing data. I've been using the memory one to date. I keep a bunch of 'domain data' type stuff from the database in there, I also have some small data from external sources that I refresh every 15 min - 1 hour.
I recently added memcache because I'm now serving up some larger assets. Sort of complex how I got into this, but these are larger ~kilobytes, relatively small in quantity (hundreds), and highly cacheable -- they change, but a refresh once per hour is probably too much. This set might grow, but it's shared across all hosts. Refreshes are expensive.
The first set of data has been using the default memory cache for a while now, and has been well-behaved. Memcache is perfect for the second set of data.
I've tuned memcache, and it's working great for the second set of data. The problem is that because of my existing code that was done 'thinking' it was in local memory, I'm doing several trips to memcache per request, which is increasing my latency.
So, I want to use 2 caches. Thoughts?
(note: memcache is running on different machine(s) than my server. Even if I ran it locally, I have a fleet of hosts so it wouldn't be local to all. Also, I want to avoid needing to just get bigger machines. Even though I probably could solve this problem by making the memory bigger and just using the in memory (the data really isn't that big), this doesn't solve the problem as I scale, so it will just be kicking the can.)
ActiveSupport::Cache::MemoryStore is what you want to use. Rails.cache uses either MemoryStore, FileStore or in my case DalliStore :-)
You can have global instance of ActiveSupport::Cache::MemoryStore and use it or create a class with a singleton pattern that holds this object (cleaner). Set Rails.cache to the other cache store and use this singleton for MemoryStore
Below is this class:
module Caching
class MemoryCache
include Singleton
# create a private instance of MemoryStore
def initialize
#memory_store = ActiveSupport::Cache::MemoryStore.new
end
# this will allow our MemoryCache to be called just like Rails.cache
# every method passed to it will be passed to our MemoryStore
def method_missing(m, *args, &block)
#memory_store.send(m, *args, &block)
end
end
end
This is how to use it:
Caching::MemoryCache.instance.write("foo", "bar")
=> true
Caching::MemoryCache.instance.read("foo")
=> "bar"
Caching::MemoryCache.instance.clear
=> 0
Caching::MemoryCache.instance.read("foo")
=> nil
Caching::MemoryCache.instance.write("foo1", "bar1")
=> true
Caching::MemoryCache.instance.write("foo2", "bar2")
=> true
Caching::MemoryCache.instance.read_multi("foo1", "foo2")
=> {"foo1"=>"bar1", "foo2"=>"bar2"}
In an initializer you can just put:
MyMemoryCache = ActiveSupport::Cache::MemoryStore.new
Then you can use it like this:
MyMemoryCache.fetch('my-key', 'my-value')
and so on.
Note that if it's just for performance optimization (and depends on time expiration), it may not be a bad idea to disable it in your test environment, as follows:
if Rails.env.test?
MyMemoryCache = ActiveSupport::Cache::NullStore.new
else
MyMemoryCache = ActiveSupport::Cache::MemoryStore.new
end
Rails already provides this by allowing you to set different values config.cache_store in your environment initializers.