How can I reduce the database call time and number (Rails)? - sql

So I'm working on a rails app for a building that keeps track of water usage/collection and electricity use/solar generation, etc. These are stored as measurement rows, attached to sensors, which are attached to programs (location in the building, essentially) and subtypes (attached to types - water, electricity).
I'm doing some graphing with chartkick, and the database calls related to this are way too slow. They'll be much faster on the production servers, but there will also be far more data.
Here's the helper method that has the chart generation and database call in it:
def stackedSubtypeChart(grouping)
rsubs = #resource.subtypes
.order(:usage?) #add usage types after gen types
.map{|stype| [
stype.name,stype.measurements #this takes too long!
.where("date >= ?", params[:start]) #(4 calls!!)
.where("date <= ?", params[:stop])
.group_by_period(grouping, :date).maximum(:amount)]}
rsubs = rsubs.map {|stype|
{name: stype[0],
data: stype[1]}}
ret = column_chart rsubs,
stacked: true,
library: { :series => {0 => { type: "line"}}}
end
#resource is defined in the controller as:
#resource = Type.includes(:subtypes => :sensors).find_by_resource('electricity')
I've commented the line that's responsible for there being multiple calls, which is definitely part of the problem. This takes two seconds to load on my (admittedly very very old) computer with a month of data.
I could really use help with both changing the map so that this is one call instead of however-many-subtypes calls, and with reducing what I'm pulling in so each call isn't taking half a second. I don't have a ton of experience optimizing this sort of thing and I'm not really sure how to start doing more than I have here already.

Might be helpful to look into ActiveRecord Explain to dig into the SQL. There's a good screencast that explains (pun totally intended) pretty well.

After a lot of bashing my head against a wall, I stumbled across this, which is a much faster single query that grabs all the data + data connections I need. It's a little hard to format but it works.
rsubs = Measurement
.where("measurements.date >= ? AND measurements.date <= ?",
offset(params[:start], -1, grouping),
offset(params[:stop], 1, grouping))
.joins(sensor: {subtype: :type})
.where("types.resource = ?", #rname)
.order('subtypes."usage?"')
.group_by_period(grouping, :date).group("subtypes.id, subtypes.name").maximum(:amount)

Related

RSpec: How to mock SQL NOW()

I can mock Time.now with a great timecop gem.
Time.now
=> 2018-05-13 18:04:46 +0300
Timecop.travel(Time.parse('2018.03.12, 12:00'))
Time.now
=> 2018-03-12 12:00:04 +0300
TeacherVacation.first.ends_at
Thu, 15 Mar 2018 12:00:00 MSK +03:00
TeacherVacation.where('ends_at > ?', Time.now).count
1
But (obviously) this wouldn't work while using NOW() in a query:
TeacherVacation.where('ends_at > NOW()').count
0
Can I mock NOW() so that it would return the results for a certain time?
Timecop is a great gem! I would recommend using Timecop.freeze instead of traveling for your instance; you want to keep your tests deterministic.
As far as I could find, there doesn't seem to be a way to mock SQL's functions. Some languages like Postgres allow overloading functions, but you would still need a way to interject, and there doesn't seem to be a way to use environment variables in SQL.
A co-worker seemed to be certain you could actually drop system/language functions and make your own, but I was concerned about how to recover them after you do that. Trying to go that route sounds like a pain.
Solutions?
Here are a couple of "solutions" that I've come up with today while fighting this problem. Note: I don't really care for them to be honest, but if it gets tests in place ¯\_(ツ)_/¯ They at least offer a way to get things "working".
Unfortunately there's no snazzy gem to control the time in SQL. I imagine you would need something crazy like a plugin to the DB, a hack, a hook, a man in the middle, a container that you could trick SQL into thinking the system time was something else. None of those hack ideas would surely be portable/platform agnostic unfortunately either.
Apparently there are some ways to set time in a docker container, but that sounds like a painful overhead for local testing, and doesn't fit the granularity of a per-test time to be set.
Another thing to note, for me we're running large complex raw SQL queries, so that's why it's important that when I run the SQL file for a test I can have proper dates, otherwise I would just be doing it through activerecord like you mentioned.
String Interpolation
I ran across this in some large queries that were being ran.
This definitely helps if you need to push some environment variables through, and you can inject your own "current_date" if you want. This would help too if you needed to utilize a certain time across multiple queries.
my_query.rb
<<~HEREDOC
SELECT *
FROM #{#prefix}.my_table
WHERE date < #{#current_date} - INTERVAL '5 DAYS'
HEREDOC
sql_runner.rb
class SqlRunner
def initialize(file_path)
#file_path = file_path
#prefix = ENV['table_prefix']
#current_date = Date.today
end
def run
execute(eval(File.read #file_path))
end
private
def execute(sql)
ActiveRecord::Base.connection.execute(sql)
end
end
The Dirty Update
The idea is to update the value from ruby land pushing your "time-copped" time into the database to overwrite the value generated by the SQL DB. You may need to get creative with your update for times, like querying for a time greater than a given time that doesn't target your timecop time that you'll be updating rows to.
The reason I don't care for this method is because it ends up feeling like you're just testing activerecord's functionality since you're not relying on the DB to set values it should be setting. You may have computations in your SQL that you're then recreating in the test to set some value to the right date, and then you're no longer doing the computation in the SQL so then you're not even actually testing it.
large_insert.sql
INSERT INTO some_table (
name,
created_on
)
SELECT
name,
current_date
FROM projects
JOIN people ON projects.id = people.project_id
insert_spec.rb
describe 'insert_test.sql' do
ACTUAL_DATE = Date.today
LARGE_INSERT_SQL = File.read('sql/large_insert.sql')
before do
Timecop.freeze Date.new(2018, 10, 28)
end
after do
Timecop.return
end
context 'populated same_table' do
before do
execute(LARGE_INSERT_SQL)
mock_current_dates(ACTUAL_DATE)
end
it 'has the right date' do
expect(SomeTable.last.created_on).to eq(Date.parse('2018.10.28')
end
end
def execute(sql_command)
ActiveRecord::Base.connection.execute(sql_command)
end
def mock_current_dates(actual_date)
rows = SomeTable.where(created_on: actual_date)
# Use our timecop datetime
rows.update_all(created_on: Date.today)
end
Fun Caveat: specs wrap in their own transactions (you can turn that off, but it's a nice feature) so if your SQL has a transaction in it, you'll need to write code to remove it for the specs, or have your runner wrap your code in transactions if you need them. They'll run but then your SQL will kill off the spec transaction and you'll have a bad time. You can create a spec/support to help out with this if you go the route of cleaning up during tests, if I were in a newer project I would go with writing a runner that wraps the queries in transactions if you need them -- even though this isn't evident in the SQL files #abstraction.
Maybe there's something out there that lets you set your system time, but that sounds terrifying modifying your system's actual time.
I think the solution for this is DI (dependency injection)
def NOW(time = Time.now)
time
end
In test
current_test = Time.new(2018, 5, 13)
p NOW(current_test)
In production
p NOW

How do you do pagination in GUN?

How do you do something like gun.get({startkey, endkey}) ?
Previously: https://github.com/amark/gun/issues/479
#qwe123wsx #sebastianmacias apologies for the delay! Originally posted at: https://github.com/amark/gun/issues/479
The wire spec has a protocol for this but it isn't implemented yet. It looks something like this:
gun.on('out', {get: {'#': {'>': 'a', '<': 'b'}}});
However this doesn't work yet. I would recommend instead:
(1) Pagination behavior is very different from one app to another and will be hard for us to create a "one-size-fits-all" solution, so it would be highly helpful if you could implement your own* pagination and make it available as a user-module, then we can learn from your experience (what worked, what didn't) and make the best solution part of core.
(2) Your app will probably work fine without pagination in the meanwhile, while it can be built (it is targeted for after 1.0), and then as your app becomes more popular, it should be fairly easy to add in without much refactor, once you need it and it is available.
... * How to build your own?
Lots of good articles on this, best one I've seen yet is from Neo4j on how to do it in a graph database (which applies to gun as well) https://graphaware.com/neo4j/2014/08/20/graphaware-neo4j-timetree.html .
Another rough idea is you model your data based on pagination or time. So rather than having ALL tweets go into user's tweet table, instead, the user's tweet table is a table of DAYS (or weeks), and then you put the tweet inside the week table. Now when you load the data, you can scan/skip based off of week very easily while it being super bandwidth efficient.
Rough PSEUDO code:
function onTweetSend(tweet){
gun.get('user').get('alice').get('tweets').get(Date.uniqueYear() + Date.uniqueWeek()).set(tweet)
}
function paginateUserTweet(howMany, cb){
var range = convertToArrayOfUniqueWeekNamesFromToday(howMany);
var all = [];
range.forEach(function(week){
gun.get('user').get('alice').get('tweets').get(week).load(function(tweets){
all.push(tweets);
if(all.length < range.length){ return }
all = flattenArray(all);
cb(all);
});
});
}
Now we can use https://gun.eco/docs/RAD#lex
gun.get(...).get({'.': {'>': startkey, '<': endkey}, '%': 50000}).map().once(...)

Best design patterns for refactoring code without breaking other parts

I have some PHP code from an application that was written using Laravel. One of the modules was written quite poorly. The controller of this module has a whole bunch of reporting functions which uses the functions defined inside the model object to fetch data. And the functions inside the model object are super messy.
Following is a list of some of the functions from the controller (ReportController.php) and the model (Report.php)(I'm only giving names of functions and no implementations as my question is design related)
Functions from ReportController.php
questionAnswersReportPdf()
fetchStudentAnswerReportDetail()
studentAnswersReport()
wholeClassScoreReportGradebookCSV()
wholeClassScoreReportCSV()
wholeClassScoreReportGradebook()
formativeWholeClassScoreReportGradebookPdf()
wholeClassScoreReport()
fetchWholeClassScoreReport()
fetchCurriculumAnalysisReportData()
curriculumAnalysisReportCSV()
curriculumAnalysisReportPdf()
studentAnswersReportCSV()
fetchStudentScoreReportStudents()
Functions from Report.php
getWholeClassScoreReportData
getReportsByFilters
reportMeta
fetchCurriculumAnalysisReportData
fetchCurriculumAnalysisReportGraphData
fetchCurriculumAnalysisReportUsersData
fetchTestHistoryClassAveragesData
fetchAllTestHistoryClassAveragesData
fetchAllTestHistoryClassAveragesDataCsv
fetchHistoryClassAveragesDataCsv
fetchHistoryClassAveragesGraphData
The business logic has been written in quite a messy way also. Some parts of it are in the controller while other parts are in the model object.
I have 2 specific questions :
a) I have an ongoing goal of reducing code complexity and optimizing code structure. How can I leverage common OOP design patterns to ensure altering the code in any given report does not negatively affect the other reports? I specifically want to clean up the code for some critical reports first but want to ensure that by doing this none of the other reports will break.
b) The reporting module is relatively static in definition and unlikely to change over time. The majority of reports generated by the application involve nested sub-queries as well as standard grouping & filtering options. Most of these SQL queries have been hosed within the functions of the model object and contain some really complex joins. Without spending time evaluating the database structure or table indices, which solution architecture techniques would you recommend for scaling the report functionality to ensure optimized performance? Below is a snippet of one of the SQL queries
$sql = 'SELECT "Parent"."Id",
"Parent"."ParentId",
"Parent"."Name" as systemStandardName,
string_agg(DISTINCT((("SubsectionQuestions"."QuestionSerial"))::text) , \', \') AS "quesions",
count(DISTINCT("SubsectionQuestions"."QuestionId")) AS "totalQuestions",
case when sum("SQUA"."attemptedUsers")::float > 0 then
(COALESCE(round((
(
sum(("SQUA"."totalCorrectAnswers"))::float
/
sum("SQUA"."attemptedUsers")::float
)
*100
)::numeric),0))
else 0 end as classacuracy,
case when sum("SQUA"."attemptedUsers")::float > 0 then
(COALESCE((round(((1 -
(
(
sum(("SQUA"."totalCorrectAnswers"))::float
/
sum("SQUA"."attemptedUsers")::float
)
)
)::float * count(DISTINCT("SubsectionQuestions"."QuestionId")))::numeric,1)),0))
else 0 end as pgain
FROM "'.$gainCategoryTable.'" as "Parent"
'.$resourceTableJoin.'
INNER JOIN "SubsectionQuestions"
ON "SubsectionQuestions"."QuestionId" = "resourceTable"."ResourceId"
INNER JOIN "Subsections"
ON "Subsections"."Id" = "SubsectionQuestions"."SubsectionId"
LEFT Join (
Select "SubsectionQuestionId",
count(distinct case when "IsCorrect" = \'Yes\' then CONCAT ("UserId", \' \', "SubsectionQuestionId") else null end) AS "totalCorrectAnswers"
, count(distinct CONCAT ("UserId", \' \', "SubsectionQuestionId")) AS "attemptedUsers"
From "SubsectionQuestionUserAnswers"';
if(!empty($selectedUserIdsArr)){
$sql .= ' where "UserId" IN (' .implode (",", $selectedUserIdsArr).')' ;
}else {
$sql .= ' where "UserId" IN (' .implode (",", $assignmentUsers).')' ;
}
$sql .= ' AND "AssessmentAssignmentId" = '.$assignmentId.' AND "SubsectionQuestionId" IN ('.implode(",", $subsectionQuestions).') Group by "SubsectionQuestionId"
) as "SQUA" on "SQUA"."SubsectionQuestionId" = "SubsectionQuestions"."Id"
INNER JOIN "AssessmentAssignment"
ON "AssessmentAssignment"."assessmentId" = "Subsections"."AssessmentId"
INNER JOIN "AssessmentAssignmentUsers"
ON "AssessmentAssignmentUsers"."AssignmentId" = "AssessmentAssignment"."Id"
AND "AssessmentAssignmentUsers"."Type" = \'User\'
'.$conditaionlJoin.'
WHERE "Parent"."Id" IN ('.implode(',', $ssLeaf).')
'.$conditionalWhere.'
GROUP BY "Parent"."Id",
"Parent"."ParentId",
"Parent"."Name"
'.$sorter;
$results = DB::select(DB::raw($sql));
My take on a). In my experience when I want to reduce code complexity/sheer messiness I slowly refactor out code that violates the single responsibility principle while I'm working in that area already for either a bug fix or a feature update. I try not to spend hours upon hours of updating code that is "working" that I'm not actively updating for a business process reason. Follow the "Leave it better than you found it" approach as you work in this code base, and it will get better over time. Doing this will allow you to improve the code base while also getting features and bug fixes out the door, while also keeping business owners/project managers happy because you're keeping things moving.
about a) : The first thing I do, to ensure none of my refactoring is breaking anything, is covering the code with, at least, unitary tests (doing TDD ensures the most optimal coverage). It's easier when, like #DavidY says, you respect principles like SRP (does my class try to answer too many problems ?). With test, you'll feel safer when you'll need to refactor, and the tests will tell you exactly where it broke.
about b) : Do not optimize until you need it. And optimize only when you know what cost you the most. It's the best way to know what pattern you need, otherwise you may try to force the wrong solution into the wrong problem.

phalcon querybuilder total_items always returns 1

I make a query via createBuilder() and when executing it (getQuery()->execute()->toArray())
I got 10946 elements. I want to paginate it, so I pass it to:
$paginator = new \Phalcon\Paginator\Adapter\QueryBuilder(array(
"builder" => $builder,
"limit" => $limit,
"page" => $current_page
));
$limit is 25 and $current_page is 1, but when doing:
$paginator->getPaginate();
$page->total_items;
returns 1.
Is that a bug or am I missing something?
UPD: it seems like when counting items it uses created sql with limit. There is no difference what limit is, limit divided by items per page always equals 1. I might be mistaken.
UPD2: Colleague helped me to figure this out, the bug was in the query phalcon produces: count() of the group by counts grouped elements. So a workaround looks like:
$dataCount = $builder->getQuery()->execute()->count();
$page->next = $page->current + 1;
$page->before = $page->current - 1 > 0 ? $page->current - 1 : 1;
$page->total_items = $dataCount;
$page->total_pages = ceil($dataCount / 100);
$page->last = $page->total_pages;
I know this isn't much of an answer but this is most likely to be a bug. Great guys at Phalcon took on a massive job that is too big to do it properly in their little free time and things like PHQL, Volt and other big but non-core components do not receive as much attention as we'd like. Also given that most time in the past 6 months was spent on v2 there are nearly 500 bugs about stuff like that and it's counting. I came across considerable issues in ORM, Volt, Validation and Session, which in the end made me stick to other not as cool but more proven solutions. When v2 comes out I'm sure all attention will on the bug list and testing, until then we are mostly on our own. Given that it's all C right now, only a few enthusiast get involved, with v2 this will also change.
If this is the only problem you are hitting, the best approach is to update your query to get the information you need yourself without getPaginate().

Printing a pdf of more than 5000 pages takes longtime using Prawn pdf gem

I am using prawn pdf gem to print pdf.
I am formatting the data in to tables and then printing it to the pdf. I have around 5000 pages (about 50000 entries) to print and it takes forever. For small number of pages its quick ... Is there any way I can improve the speed of printing.
Also, printing without the data in table format was quick. please help me out with this.
code for this :
format.pdf {
pdf = Prawn::Document.new(:margin => [20,20,20,20])
pdf.font "Helvetica"
pdf.font_size 12
#test_points_all = Hash.new
dataset_id = Dataset.where(collection_success: true).order('created_at DESC').first
if(inode.leaf?)
meta=MetricInstance.where(dataset_id: dataset_id, file_or_folder_id: inode.id).includes(:test_points,:file_or_folder,:dataset).first
#test_points_all[inode.name] = meta.test_points
else
nodes2 = []
nodes2 = inode.leaves
if(!nodes2.nil?)
nodes2.each do |node|
meta=MetricInstance.where(dataset_id: dataset_id, file_or_folder_id: node.id).includes(:test_points,:file_or_folder,:dataset).first
#test_pointa = meta.test_points
if(!#test_pointa.nil?)
#test_points_all[node.name] = #test_pointa
end
end
end
end
#test_points_all.each do |key, points|
table_data = [["<b> #{key} </b>", "<b>433<b>","xyz","xyzs"]]
points.each do |test|
td=TestDescription.find(:first, :conditions=>["test_point_id=?", test.id])
if (!td.nil?)
table_data << ["#{test.name}","#{td.header_info}","#{td.comment_info}","#{td.line_number}"]
end
pdf.move_down(5)
pdf.table(table_data, :width => 500, :cell_style => { :inline_format => true ,:border_width => 0}, :row_colors => ["FFFFFF", "DDDDDD"])
pdf.text ""
pdf.stroke do
pdf.horizontal_line(0, 570)
end
pdf.move_down(5)
end
end
pdf.number_pages("<page> of <total>", {
:start_count_at => 1,
:page_filter => lambda{ |pg| pg > 0 },
:at => [pdf.bounds.right - 50, 0],
:align => :right,
:size => 9
})
pdf.render_file File.join(Rails.root, "app/reports", "x.pdf")
filename = File.join(Rails.root, "app/reports", "x.pdf")
send_file filename, :filename => "x.pdf", :type => "application/pdf",:disposition => "inline"
end
The first of those two lines is pointless, take it out!
nodes2 = []
nodes2 = inode.leaves
Based on your information, i understand that the following query to the database seems to be performed around 50000 times ... Depending on the volume and content of your table, it might be very reasonable to perform one single query (fetching the whole table) at the start of your whole script, and to keep this data in memory to perform any following operations on it in pure Ruby, without talking to the database. Then again, if the table you are working with is insanely huge, it might also totally clog up your memory and be not a good idea at all. It really depends ... so figure it out!
TestDescription.find(:first, :conditions=>["test_point_id=?", test.id])
Also, if, as you say, printing without tables was very quick, you might be able to achieve a major speedup by reimplementing that minor part of table functionality you are actually using yourself, with only low level functions from prawn. Why? Prawn's table function is surely made to fulfill as many usecases as possible, and therefore includes a lot of overhead (at least form the perspective of someone who needs only barebones functionality - For everyone else this "overhead" is gold!). And therefore you can just implement that little part of tables you need yourself, and that might just give you a major performance boost. Give it a shot!
If you're using a recent version of ActiveRecord, I'd suggest using pluck in your inner loop. Instead of this:
td=TestDescription.find(:first, :conditions=>["test_point_id=?", test.id])
if (!td.nil?)
table_data << ["#{test.name}","#{td.header_info}","#{td.comment_info}","#{td.line_number}"]
end
Try this instead:
td = TestDescription.where(test_point_id: test.id)
.pluck(:name, :header_info, :comment_info, :line_number).first
table_data << td unless td.blank?
Instead of instantiating an ActiveRecord object for each TestDescription, you'll just get back an array of field values that you should be able to append directly to table_data, which is really all you need here. This means less memory usage, and less time spent in GC.
It might also be worth trying to use pluck to retrieve all the entries at once, in which case you'd have an array of arrays to loop over. This would take more memory than fetching one at a time, but a lot less than an array of AR objects, and you'd save doing separate db queries.