Rails - speed and time - ruby-on-rails-3

I have a question about the "speed of logic" between MVC.
Suppose to have the same code, something like the follow, in a model, in a view and in a controller.
1) The speed for "compiling" the logic and the query is the same in all three (M-V-C)?
Pseudocode
x = model.where(:a > 3, b < 9).a.first
y = model.sum(:a)
z = (x / y) * 2310.0
Date.today - 5
This is a "stupid" pseudocode, but I want to know the performance of line of code most used by my app (calling a where query, calling a sum (aggregate) query, do some math, playing with date)
The problem, is that my pages are a bit so slowly to load. I have deplaced all that manage queries in the Models and add indexes. Maybe adding the caching can solve a little the problem (but I use Highcharts that I think can't be cached).
2) How can I find where is the code bottleneck (that slow the loading of pages)?

You can use some known tools to profile your controllers/actions/views/models.
NewRelic (a good agent to track your time in distributed manner..I'd prefer this one)
Librato (An agent to which you can pass on your metrics whenever a controller/action is hit and it can give you a result over a period of time)
Rails console outputs the distribution of time spent in controller, views and active record. You can definitely track some good things here. (Please see the attached screenshot).

Related

Sequence number field with concurrent writes

Imagine a rail track, and my goal is to store every railcar that exists on that track. Each railcar has a position. Say there are 100 railcars on the track, so each railcar object would have a TrackPosition from 1-100.
That is essentially what we are doing right now, with a Track having child Railcars, and each Railcar has an integer TrackPosition.
When a new railcar is added, we simply take the # of cars in the track + 1 to save the position of the new car.
We are running into issues in a few different areas:
We would like to add cars concurrently using AWS Lambda. This presents a problem as two functions could hit the line of logic that calculates "total cars on track + 1" at the same time. When they go to save, both cars would have the same position. Locking that bit of code is not possible within AWS Lambda (as far as I can tell from what I've read). We've resolved this for the time being by setting the Lambdas to fire synchronously (concurrency set to 1), obviously not ideal for performance.
We would like to add a car into the middle of a track. This would involve taking any car with a greater position and incrementing them all. Not difficult to write some code to do this, but..
I'm wondering if I'm missing something fundamental within SQL that can achieve what I am after in a less error-prone way. The way I'm doing it seems naive. I've looked into Sequences, but I'm not sure if they would solve my concurrency issue.
Any insight would be greatly appreciated. We are using Entity Framework Core 2 with SQL Server.

How can I speed up this query in a Rails app?

I need help optimizing a series of queries in a Rails 5 app. The following explains what I am doing, but if it isn't clear let me know and I will try to go into better detail.
I have the following methods in my models:
In my IncomeReport model:
class IncomeReport < ApplicationRecord
def self.net_incomes_2015_totals_collection
all.map(&:net_incomes_2015).compact
end
def net_incomes_2015
(incomes) - producer.expenses_2015
end
def incomes
total_yield * 1.15
end
end
In my Producer model I have the following:
class Producer < ApplicationRecord
def expenses_2015
expenses.sum(&:expense_per_ha)
end
end
In the Expense model I have:
class Expense < ApplicationRecord
def expense_per_ha
total_cost / area
end
end
In the controller I have this
(I am using a gem called descriptive_statistics to get min, max, quartiles, etc in case you are wondering about that part at the end)
#income_reports_2015 = IncomeReport.net_incomes_2015_totals_collection.extend(DescriptiveStatistics)
Then in my view I use
<%= #income_reports_2015.descriptive_statistics[:min] %>
This code works when there are only a few objects in the database. However, now that there are thousands the query takes forever to give a result. It takes so long that it times out!
How can I optimize this to get the most performant outcome?
One approach might be to architecture your application differently. I think a service-oriented architecture might be of use in this circumstance.
Instead of querying when the user goes to this view, you might want to use a worker to query intermittently, then write to a CSV. Thus, a user navigates to this view and you could read from the CSV instead. This would run much faster because instead of doing a query then & there(when the user navigates to this page) you're simply reading from a file that was created before as a background process.
Obviously, this has its own set of challenges, but I've done this in the past to solve a similar problem. I wrote an app that fetched data from 10 different external API's once a minute. The 10 different fetches resulted in 10 objects in the db. 10 * 60 * 24 = 14,400 records in the DB per day. When a user would load the page requiring this data, they would load 7 days worth of records, 100,800 database rows. I ran into the same problem where the query being done at runtime resulted in a timeout, I wrote to a CSV and read it as a workaround.
What's the structure of IncomeReport? By looking at the code your problem lies in all from net_incomes_2015_totals_collection. all hits the database and returns all records then you map them. Overkill. Try to filter the data, query less, select less and get all the info you want directly with ActiveRecord. Ruby loops slows things down.
So, without know the table structure and its data, I'd do the following:
def self.net_incomes_2015_totals_collection
where(created_at: 2015_start_of_year..2015_end_of_year).where.not(net_incomes_2015: nil).pluck(:net_incomes_2015)
end
Plus I'd make sure there's a composide index for created_at and net_incomes_2015.
It will probably be slow but better than it is now. You should think about aggregating the data in the background (resque, sidekiq, etc) at midnight (and cache it?).
Hope it helps.
It looks like you have a few n+1 queries here. Each report grabs its producer in an an individual query. Then, each producer grabs each of its expenses in a different query.
You could avoid the first issue by throwing a preload(:producer) instead of the all. However, the sums later will be harder to avoid since sum will automatically fire a query.
You can avoid that issue with something like
def self.net_incomes_2015_totals_collection
joins(producer: :expenses).
select(:id, 'income_reports.total_yield * 1.15 - SUM(expenses.total_cost/expenses.area) AS net_incomes_2015').
group(:id).
map(&:net_incomes_2015).
compact
end
to get everything in one query.

Object condition in multiple places/repeated code (DRY)

This is a fundamental application design question I’ve struggled with and flip-flopped on for years. We have a legacy webapp that doesn't really have a solid ORM, if that tidbit might influence your answer. To abstract my question let’s say we have a class Car, and a corresponding table in our database named car. Car has a few properties: color, weight, year, maxspeed These properties directly correspond to columns in the db table.
In our application, we define the car as “classic old” if year is < 1960 and color = black. And in many places within our app knowing whether the car is "classic old" is extremely important (maybe we’re running a very illogical insurance agency which gives steep discounts and other perks to cars which are “classic old”).
All over our application, we do things like:
--list all classic old cars
--give the current user a discount if their car is classic old
--list all classic old cars with max speed > 100 miles per hour
--email the current user if their car is classic old and weights more than 1000 pounds
What is the best way to go about this? We have a legacy application that does this in some places:
getOldClassicCars()
select * where year < 1960 and color = black
and in other places:
cararray = getAllCars();
for each car in cararray
if car.year < 1960 and car.color = black
oldcararray = car.add()
The point being that this very important, fundamental piece of our application – is the car classic old – is “hardcoded” as year < 1960 and color = black in many places. Sometimes in SQL, sometimes in application code, etc. Obviously that is not good, but as we’ve refactored things I’m not sure we’re refactoring things the best way we can.
Well, you are stuck with the fundamental problem that
you cant run your code on the database
you want to be able to use the database's selection functionality on this criteria.
you want the calculation of "classic old" to be defined in a single place (preferably code)
Lets enumerate the solutions
1: Put the calculation in a sproc and always use the sproc to retrieve cars.
The problem here is if you create a new car in code, its class status is undefined, so you haven't really solved the 'not in two places' problem.
2: Get the DB to run your calc via an assembly. for example you can get mssql to run functions from a .net assembly which you can also use in your code base to perform the same calculation.
Problem, its hard work. Plus essentially its still in two places, you have to keep the db up to date and ensure that the table is accessed correctly
3: Persist the calculated value on the DB, but perform the calc in the code
Problem, if the calculation changes the DB values will be incorrect and need updating.
3 seems to be the best option, as we will know when the calculation changes and be able to take some action to resolve the situation.
However, it might be best, given the fundamental nature of this calculation, to make that 'out of dateness' implicit in the way we structure the code.
Instead of simply persisting car.IsClassic we could add a CarStatusReport object with a datetime property. We then generate a CarStatusReport(2017) which evaluates all the cars at that point in time and saves that data in a separate table.
Our business logic is then no longer, "Is this car a classic?" but "What does the latest CarStatusReport say the status of this car is?"
You Business Logic will then reside in a single CarStatusReportGenerator service and any other logic accessing the IsClassic calculation, will be forced to acknowledge the ephemeral nature of the stored info.
No optimal solution here. But, one good point will be to move all the business logic into the one place. If you can't (when you make methods or functions calculating some property, for example isOld()) then hide all those inconsistencies under the hood, so implementation users (conceptually) will never notice DRY violation from outside.

Way to optimise a mapping on informatica

I would like to optimise a mapping developped by one of my colleague and where the "loading part" (in a flat file) is really really slow - 12 row per sec
Currently, to get to the point where I start writting in my file, I take about 2 hours, so I would like to know where I should start looking first otherwise, I will need at least 2 hours between each improvment - which is not really efficient.
Ok, so to describe simply what is done :
Oracle table (with big query inside - takes about 2 hours to get a result)
SQ
2 LKup on ref table (should not be heavy)
update strategy
1 transformer
2 Lk up (on big table - that should be one optimum point I guess : change them to joiner)
6 stored procedure (these also seem a bit heavy, what do you think ?)
another tranformer
load in the flat file
Can you confirm that either the LK up or the stored procedur part could be the reason why it is so slow ?
Do you think that I should look somewhere else to optimize ? I was thinking may be only 1 transformer.
First check the logs carefuly. Look at the timestamps. It should give you initial idea what part causes delay.
Lookups to big tables are not recommended. Joiners are a better way, but they still need to cache data. Can you limit the data for cache, perhaps? It'll be very hard to advise without seeing it.
Which leads us to the Stored Procedures: it's simply impossible to tell anything about them just like that.
So: first collect the stats and do log analysis. Next, read some tuning guides on the Net - there's plenty. Here's a more comprehensive one, but well... large - so you might like to try and look for some other ones.
Powercenter Performance Tuning Guide

Complex derived attributes in Django models

What I want to do is implement submission scoring for a site with users voting on the content, much like in e.g. reddit (see the 'hot' function in http://code.reddit.com/browser/sql/functions.sql). Edit: Ultimately I want to be able to retrieve an arbitrarily filtered list of arbitrary length of submissions ranked according to their score.
My submission model currently keeps track of up and down vote totals. Currently, when a user votes I create and save a related Vote object and then use F() expressions to update the Submission object's voting totals. The problem is that I want to update the score for the submission at the same time, but F() expressions are limited to only simple operations (it's missing support for log(), date_part(), sign() etc.)
From my limited experience with Django I can see 5 options here:
extend F() somehow (haven't looked at the code yet) to support the missing SQL functions; this is my preferred option and seems to fit within the Django framework the best
define a scoring function (much like reddit's 'hot' function) in my database, and have Django use the value of that function for the value of the score field; as far as I can tell, #2 is not possible
wrap my two step voting process in a suitably isolated transaction so that I can calculate the voting totals in Python and then update the Submission's voting totals without fear that another vote against the submission could be added/changed in the meantime; I'm hesitant to take this route because it seems overly complex - what is a "suitably isolated transaction" in this case anyway?
use raw SQL; I would prefer to avoid this entirely -- what's the point of an ORM if I have to revert to SQL for such a common use case as this! (Note that this coming from somebody who loves sprocs, but is using Django for ease of development.)
(edit: added this after further discussion) compute the score using an extra select parameter containing a call to my function; this would work but impose unnecessary load on the DB (would be forced to calculate the score for every submission ever made every time the query ran; caching could help here, but it still seems like a bit of lame workaround)
Before I embark on this mission to extend F() (which I'm not sure is even possible), am I about to reinvent the wheel? Is there a more standard way to do this? It seems like such a common use case and yet in an hour of searching I have yet to find a common solution...
EDIT: There is another option: set the default value of the field in the database script to be an expression containing my function. This is not as flexible as #1, but probably the quickest and cleanest approach to solving the problem (although my initial investigation into extending F() looks promising).
Why can't you just denormalize the score and reconstruct it with the Vote objects every once and a while?
If you can't do that, it is very easy to make a 'property' function that acts as an object attribute for scoring.
#property
def score(self):
... calculate score from Vote objects ...
return score
I've never used F() on a property like this, but it's Python, so I bet it works.
If you are using django-voting (which I recommend), you can put #3 in the manager's record_vote function since that's how all vote transactions take place.