Printing a pdf of more than 5000 pages takes longtime using Prawn pdf gem - pdf

I am using prawn pdf gem to print pdf.
I am formatting the data in to tables and then printing it to the pdf. I have around 5000 pages (about 50000 entries) to print and it takes forever. For small number of pages its quick ... Is there any way I can improve the speed of printing.
Also, printing without the data in table format was quick. please help me out with this.
code for this :
format.pdf {
pdf = Prawn::Document.new(:margin => [20,20,20,20])
pdf.font "Helvetica"
pdf.font_size 12
#test_points_all = Hash.new
dataset_id = Dataset.where(collection_success: true).order('created_at DESC').first
if(inode.leaf?)
meta=MetricInstance.where(dataset_id: dataset_id, file_or_folder_id: inode.id).includes(:test_points,:file_or_folder,:dataset).first
#test_points_all[inode.name] = meta.test_points
else
nodes2 = []
nodes2 = inode.leaves
if(!nodes2.nil?)
nodes2.each do |node|
meta=MetricInstance.where(dataset_id: dataset_id, file_or_folder_id: node.id).includes(:test_points,:file_or_folder,:dataset).first
#test_pointa = meta.test_points
if(!#test_pointa.nil?)
#test_points_all[node.name] = #test_pointa
end
end
end
end
#test_points_all.each do |key, points|
table_data = [["<b> #{key} </b>", "<b>433<b>","xyz","xyzs"]]
points.each do |test|
td=TestDescription.find(:first, :conditions=>["test_point_id=?", test.id])
if (!td.nil?)
table_data << ["#{test.name}","#{td.header_info}","#{td.comment_info}","#{td.line_number}"]
end
pdf.move_down(5)
pdf.table(table_data, :width => 500, :cell_style => { :inline_format => true ,:border_width => 0}, :row_colors => ["FFFFFF", "DDDDDD"])
pdf.text ""
pdf.stroke do
pdf.horizontal_line(0, 570)
end
pdf.move_down(5)
end
end
pdf.number_pages("<page> of <total>", {
:start_count_at => 1,
:page_filter => lambda{ |pg| pg > 0 },
:at => [pdf.bounds.right - 50, 0],
:align => :right,
:size => 9
})
pdf.render_file File.join(Rails.root, "app/reports", "x.pdf")
filename = File.join(Rails.root, "app/reports", "x.pdf")
send_file filename, :filename => "x.pdf", :type => "application/pdf",:disposition => "inline"
end

The first of those two lines is pointless, take it out!
nodes2 = []
nodes2 = inode.leaves
Based on your information, i understand that the following query to the database seems to be performed around 50000 times ... Depending on the volume and content of your table, it might be very reasonable to perform one single query (fetching the whole table) at the start of your whole script, and to keep this data in memory to perform any following operations on it in pure Ruby, without talking to the database. Then again, if the table you are working with is insanely huge, it might also totally clog up your memory and be not a good idea at all. It really depends ... so figure it out!
TestDescription.find(:first, :conditions=>["test_point_id=?", test.id])
Also, if, as you say, printing without tables was very quick, you might be able to achieve a major speedup by reimplementing that minor part of table functionality you are actually using yourself, with only low level functions from prawn. Why? Prawn's table function is surely made to fulfill as many usecases as possible, and therefore includes a lot of overhead (at least form the perspective of someone who needs only barebones functionality - For everyone else this "overhead" is gold!). And therefore you can just implement that little part of tables you need yourself, and that might just give you a major performance boost. Give it a shot!

If you're using a recent version of ActiveRecord, I'd suggest using pluck in your inner loop. Instead of this:
td=TestDescription.find(:first, :conditions=>["test_point_id=?", test.id])
if (!td.nil?)
table_data << ["#{test.name}","#{td.header_info}","#{td.comment_info}","#{td.line_number}"]
end
Try this instead:
td = TestDescription.where(test_point_id: test.id)
.pluck(:name, :header_info, :comment_info, :line_number).first
table_data << td unless td.blank?
Instead of instantiating an ActiveRecord object for each TestDescription, you'll just get back an array of field values that you should be able to append directly to table_data, which is really all you need here. This means less memory usage, and less time spent in GC.
It might also be worth trying to use pluck to retrieve all the entries at once, in which case you'd have an array of arrays to loop over. This would take more memory than fetching one at a time, but a lot less than an array of AR objects, and you'd save doing separate db queries.

Related

How to Convert foor loop to NHibernate Futures for performance

NHibernate Version: 3.4.0.4000
I'm currently working on optimizing our code so that we can reduce the number of round trips to the database and am looking at a for loop that is one of the culprits. I'm having a hard time figuring out how to batch all of these iterations into a future that gets executed once when sent to SQL Server. Essentially each iteration of the loop causes 2 queries to hit the database!
foreach (var choice in lineItem.LineItemChoices)
{
choice.OptionVersion = _session.Query<OptionVersion>().Where(x => x.Option.Id == choice.OptionId).OrderByDescending(x => x.OptionVersionNumber).FirstOrDefault();
choice.ChoiceVersion = _session.Query<ChoiceVersion>().OrderByDescending(x => x.ChoiceVersionIdentity.ChoiceVersionNumber).Where(x => x.Choice.Id == choice.ChoiceId).FirstOrDefault();
}
One option is to extract OptionId and ChoiceId from all the LineItemChoices into two lists in local memory. Then issue just two queries, one for options and one for choices, giving these lists in .Where(x => optionIds.Contains(x.Option.Id)). This corresponds to SQL IN operator. This requires some postprocessing. You will get two result lists (transform to dictionary or lookup if you expect many results), that you need to process to populate the choice objects. This postprocessing is local and tends to be very cheap compared to database roundtrips. This option can be a bit tricky if the existing FirstOrDefault part is absolutely necessary. Do you expect there to be more than result for a single optionId? If not, this code could instead have used SingleOrDefault, which could just be dropped if converting to use IN-queries.
The other option is to use futures (https://nhibernate.info/doc/nhibernate-reference/performance.html#performance-future). For Linq it means to use ToFuture or ToFutureValue at the end, which also conflicts with FirstOrDefault I believe. The important thing is that you need to loop over all line item choices to initialize ALL queries BEFORE you access the value of any of them. So this is likely to also result in some postprocessing, where you would first store the future values in some list, and then in a second loop access the real value from each query to populate the line item choice.
If you to expect that the queries can yield more than one result (before applying FirstOrDefault), I think you can just use Take(1) instead, as that will still return an IQueryable where you can apply the future method.
The first option is probably the most efficient, since it will just be two queries and allow the database engine to make just one pass over the tables.
Keep the limit on the maximum number of parameters that can be given in an SQL query in mind. If there can be thousands of line item choices, you may need to split them in batches and query for at most 2000 identifiers per round trip.
Adding on the Oskar answer, NHibernate Futures was implement in NHibernate 2.1. It is available on method Future for collections and FutureValue for single values.
In your case, you could separate the IDs of the list in memory ...
var optionIds = lineItem.LineItemChoices.Select(x => x.OptionId);
var choiceIds = lineItem.LineItemChoices.Select(x => x.ChoiceId);
... and execute two queries using Future<T> to get two lits in one hit over the database.
var optionVersions = _session.Query<OptionVersion>()
.Where(x => optionIds.Contains(x.Option.Id))
.OrderByDescending(x => x.OptionVersionNumber)
.Future<OptionVersion>();
var choiceVersions = _session.Query<ChoiceVersion>()
.Where(x => choiceIds.Contains(x.Choice.Id))
.OrderByDescending(x => x.ChoiceVersionIdentity.ChoiceVersionNumber)
.Future<ChoiceVersion>();
After with all you need in memory, you could loop on the original collection you have and search in memory the data to fill up the choice object.
foreach (var choice in lineItem.LineItemChoices)
{
choice.OptionVersion = optionVersions.OrderByDescending(x => x.OptionVersionNumber).FirstOrDefault(x => x.Option.Id == choice.OptionId);
choice.ChoiceVersion = choiceVersions.OrderByDescending(x => x.ChoiceVersionIdentity.ChoiceVersionNumber).FirstOrDefault(x => x.Choice.Id == choice.ChoiceId);
}

How can I reduce the database call time and number (Rails)?

So I'm working on a rails app for a building that keeps track of water usage/collection and electricity use/solar generation, etc. These are stored as measurement rows, attached to sensors, which are attached to programs (location in the building, essentially) and subtypes (attached to types - water, electricity).
I'm doing some graphing with chartkick, and the database calls related to this are way too slow. They'll be much faster on the production servers, but there will also be far more data.
Here's the helper method that has the chart generation and database call in it:
def stackedSubtypeChart(grouping)
rsubs = #resource.subtypes
.order(:usage?) #add usage types after gen types
.map{|stype| [
stype.name,stype.measurements #this takes too long!
.where("date >= ?", params[:start]) #(4 calls!!)
.where("date <= ?", params[:stop])
.group_by_period(grouping, :date).maximum(:amount)]}
rsubs = rsubs.map {|stype|
{name: stype[0],
data: stype[1]}}
ret = column_chart rsubs,
stacked: true,
library: { :series => {0 => { type: "line"}}}
end
#resource is defined in the controller as:
#resource = Type.includes(:subtypes => :sensors).find_by_resource('electricity')
I've commented the line that's responsible for there being multiple calls, which is definitely part of the problem. This takes two seconds to load on my (admittedly very very old) computer with a month of data.
I could really use help with both changing the map so that this is one call instead of however-many-subtypes calls, and with reducing what I'm pulling in so each call isn't taking half a second. I don't have a ton of experience optimizing this sort of thing and I'm not really sure how to start doing more than I have here already.
Might be helpful to look into ActiveRecord Explain to dig into the SQL. There's a good screencast that explains (pun totally intended) pretty well.
After a lot of bashing my head against a wall, I stumbled across this, which is a much faster single query that grabs all the data + data connections I need. It's a little hard to format but it works.
rsubs = Measurement
.where("measurements.date >= ? AND measurements.date <= ?",
offset(params[:start], -1, grouping),
offset(params[:stop], 1, grouping))
.joins(sensor: {subtype: :type})
.where("types.resource = ?", #rname)
.order('subtypes."usage?"')
.group_by_period(grouping, :date).group("subtypes.id, subtypes.name").maximum(:amount)

More efficient Active Record query for large number of columns

I'm trying to work out a more efficient way to add a note count, with a couple of simple where conditions applied to the query. This can take forever, though, as there are as many as 20K records to iterate over. Would welcome any thinking on this.
def reblog_array(notes)
data = []
notes.select('note_type, count(*) as count').where(:note_type => 'reblog', :created_at => Date.today.years_ago(1)..Date.today).group('DATE(created_at)').each do |n|
data << n.count
end
return data
end
This is what's passed to reblog_array(notes) from my controller.
#tumblr = Tumblr.find(params[:id])
#notes = Note.where("tumblr_id = '#{#tumblr.id}'")
From what I can tell, you are trying to calculate how many reblogs/day this Tumblr account/blog had? If so,
notes.where(:note_type => 'reblog', :created_at => Date.today.years_ago(1)..Date.today).group('DATE(created_at)').count.values
should give you the right result, without having to iterate over the result list again. One thing to note, your call right now won't indicate when there are days with 0 reblogs. If you drop the call to #values, you'll get a hash of date => count.
As an aside and in case you didn't know, I'd also suggest making more use of the ActiveRecord relations:
Class Tumblr
has_many :notes
end
#tumblr = Tumblr.find(params[:id])
#notes = #tumblr.notes
this way you avoid writing code like Note.where("tumblr_id = '#{#tumblr.id}'"). It's best to avoid string-interpolated parameters, in favour of code like Note.where(:tumblr_id => #tumblr.id) or Note.where("tumblr_id = ?", #tumblr.id) to leave less chance that you'll write code vulnerable to SQL injection

Rails SQL efficiency for where statement

Is there a more efficient method for doing a Rails SQL statement of the following code?
It will be called across the site to hide certain content or users based on if a user is blocked or not so it needs to be fairly efficient or it will slow everything else down as well.
users.rb file:
def is_blocked_by_or_has_blocked?(user)
status = relationships.where('followed_id = ? AND relationship_status = ?',
user.id, relationship_blocked).first ||
user.relationships.where('followed_id = ? AND relationship_status = ?',
self.id, relationship_blocked).first
return status
end
In that code, relationship_blocked is just an abstraction of an integer to make it easier to read later.
In a view, I am calling this method like this:
- unless current_user.is_blocked_by_or_has_blocked?(user)
- # show the content for unblocked users here
Edit
This is a sample query.. it stops after it finds the first instance (no need to check for a reverse relationship)
Relationship Load (0.2ms) SELECT "relationships".* FROM "relationships" WHERE ("relationships".follower_id = 101) AND (followed_id = 1 AND relationship_status = 2) LIMIT 1
You can change it to only run one query by making it use an IN (x,y,z) statement in the query (this is done by passing an array of ids to :followed_id). Also, by using .count, you bypass Rails instantiating an instance of the model for the resulting relationships, which will keep things faster (less data to pass around in memory):
def is_blocked_by_or_has_blocked?(user)
relationships.where(:followed_id => [user.id, self.id], :relationship_status => relationship_blocked).count > 0
end
Edit - To get it to look both ways;
Relationship.where(:user_id => [user.id, self.id], :followed_id => [user.id, self.id], :relationship_status => relationship_blocked).count > 0

Optimize the query PostgreSql-8.4

I have rails controller coding as below:
#checked_contact_ids = #list.contacts.all(
:conditions => {
"contacts_lists.contact_id" => #list.contacts.map(&:id),
"contacts_lists.is_checked" => true
}
).map(&:id)
its equivalent to sql
SELECT *
FROM "contacts"
INNER JOIN "contacts_lists" ON "contacts".id = "contacts_lists".contact_id
WHERE ("contacts_lists".list_id = 67494 )
This above query takes more time to run, I want another way to run the same query with minimum time.
Is anyone knows please notice me Or is it possible? or is the above query enough for give output?
I am waiting information...................
I think the main problem with your original AR query is that it isn't doing any joins at all; you pull a bunch of objects out of the database via #list.contacts and then throw most of that work away to get just the IDs.
A first step would be to replace the "contacts_lists.contact_id" => #list.contacts.map(&:id) with a :joins => 'contact_lists' but you'd still be pulling a bunch of stuff out of the database, instantiating a bunch of objects, and then throwing it all away with the .map(&:id) to get just ID numbers.
You know SQL already so I'd probably go straight to SQL via a convenience method on your List model (or whatever #list is), something like this:
def checked_contact_ids
connection.execute(%Q{
SELECT contacts.id
FROM contacts
INNER JOIN contacts_lists ON contacts.id = contacts_lists.contact_id
WHERE contacts_lists.list_id = #{self.id}
AND contacts_lists.is_checked = 't'
}).map { |r| r['id'] }
end
And then, in your controller:
#checked_contact_ids = #list.checked_contact_ids
If that isn't fast enough then review your indexes on the contacts_lists table.
There's no good reason not go straight to SQL when you know exactly what data you need and you need it fast; just keep the SQL isolated inside your models and you shouldn't have any problems.