I have got model Team and I've got (i.e.) team = Team.first :offset => 20. Now I need to get number of position of my team in db table.
I can do it in ruby:
Team.all.index team #=> 20
But I am sure that I can write it on SQL and it will be less expensive for me with big tables.
Assuming the order is made by ID desc:
class Team < ActiveRecord::Base
def position
self.class.count(:conditions => ['id <= ?', self.id])
end
end
I'm not sure I'm following along, but will position value be used a lot it might be worth making sure it is set on save. Will a team change position over time or is it static?
I don't think the position of a record in the database is one of the things a database really cares for, that kind of data should be up to you to set.
Depending on the database you may be able to use the LIMIT clause, but you can't use it to find the position of a certain row. Just to select rows number 20 to 21 for example, but it'll differ depending on sorting order.
SELECT * FROM teams LIMIT 20,21
I think setting a position attribute on your model or having a join model is the way to go.
Related
I have a listing of ~10,000 apps and I'd like to order them by certain columns, but I want to give certain columns more "weight" than others.
For instance, each app has overall_ratings and current_ratings. If the app has a lot of overall_ratings, that's worth 1.5, but the number of current_ratings would be worth, say 2, since the number of current_ratings shows the app is active and currently popular.
Right now there are probably 4-6 of these variables I want to take into account.
So, how can I pull that off? In the query itself? After the fact using just Ruby (remember, there are over 10,000 rows that would need to be processed here)? Something else?
This is a Rails 3.2 app.
Sorting 10000 objects in plain Ruby doesn't seem like a good idea, specially if you just want the first 10 or so.
You can try to put your math formula in the query (using the order method from Active Record).
However, my favourite approach would be to create a float attribute to store the score and update that value with a before_save method.
I would read about dirty attributes so you only perform this scoring when some of you're criteria is updated.
You may also create a rake task that re-scores your current objects.
This way you would keep the scoring functionality in Ruby (you could test it easily) and you could add an index to your float attribute so database queries have better performance.
One attempt would be to let the DB do this work for you with some query like: (can not really test it because of laking db schema):
ActiveRecord::Base.connection.execute("SELECT *,
(2*(SELECT COUNT(*) FROM overall_ratings
WHERE app_id = a.id) +
1.5*(SELECT COUNT(*) FROM current_ratings
WHERE app_id = a.id)
AS rating FROM apps a
WHERE true HAVING rating > 3 ORDER BY rating desc")
Idea is to sum the number of ratings found for each current and overall rating with the subqueries for an specific app id and weight them as desired.
I would like some help on constructing sql queries for use in rails with activerecord-postgis-adapter. I have been doing quite a bit of reading but am now a bit stuck, any help would be much appreciated.
I have the two models Events and Areas:
Events have a 'geometry' column which is of type Point
class Event < ActiveRecord::Base
self.rgeo_factory_generator = RGeo::Geos.factory_generator
end
t.spatial "geometry", :limit => {:srid=>4326, :type=>"polygon", :geographic=>true}
Areas have a 'geometry' column which is of type Polygon
class Area < ActiveRecord::Base
self.rgeo_factory_generator = RGeo::Geos.factory_generator
end
t.spatial "geometry", :limit => {:srid=>4326, :type=>"point", :geographic=>true}
I can create and plot both events and areas on a google map, and create areas by clicking on a map and saving to the database.
I want to be able to do the follow 2 queries:
#area.events - show all the events in an area
#event.areas - show all the areas a single event is in
I know i might be asking a bit much here, but any help would be much appreciated
Many thanks
Here's a quick way to do this. These will simply return arrays of ActiveRecord objects.
class Area
def events
Event.joins("INNER JOIN areas ON areas.id=#{id} AND st_contains(areas.geometry, events.geometry)").all
end
end
class Event
def areas
Area.joins("INNER JOIN events ON events.id=#{id} AND st_contains(areas.geometry, events.geometry)").all
end
end
You probably should memoize (cache the result) so that you don't query the database every time you call the method. That should be straightforward; I leave it as an exercise for the reader.
It may be possible to get sophisticated and wrap this up in a true Rails association proxy (so you can get all the Rails association goodies). I haven't looked into this though. It wouldn't be a standard Rails association in any case, because you're not storing IDs.
Twelfth is right: you should create spatial indexes for both tables. Activerecord-postgis-adapter should make those easy to do in your migration.
change_table :events do |t|
t.index :geometry, :spatial => true
end
change_table :areas do |t|
t.index :geometry, :spatial => true
end
If you're having trouble with installing postgis, I recently wrote up a bunch of blog entries on this stuff. Check out http://www.daniel-azuma.com/blog/archives/category/tech/georails. I'm also the author of rgeo and activerecord-postgis-adapter themselves, so I'm happy to help if you're stuck on stuff.
This answer will be a bit of a work in progress for you. I'm weak with ruby on rails, but I should be able to help you through the DB section.
You have two tables, Area which holds a polygon and Event which holds the event as a single point (it's a bit more complicated if the event is also an area and you're trying to pick out overlapping area's...if events are single points this works).
Select *
from area a inner join event e on 1=1
This is going to create a list of every area joined to every event...if you have 500 events and 20 area's, this will query will return 10'000 lines. Now you want to filter this so only events that are within the area they've been joined to. We can use st_contains for this as st_contains(polygon,point):
where st_contains(a.polygon,e.point) = 't'
If you run this, it should give you a.,e. for all events within area's. Now it's just a matter of counting what you want to count.
select a.id, count(1)
from area a inner join event e on 1=1
where st_contains(a.polygon,e.point) = 't'
group by 1
This will give you a list of all your area's (by id) and the count of the events in it. Switching out a.id with e.id will give a list of event id's and the number area's they are in.
Unfortunately I have no idea how to express these queries within Ruby, but the DB concepts that you'll need are here...
For speed considerations, you should look into the GIStree indexing that Postgres has...indexed polygons perform exponentially better.
Edit:
PostGIS is a contrib file that comes with Postgres but does not exist in a standard install...you'll need to find this contrib file. These will install a series of GIS functions within your database including ST_Contains. (functions reside in a database, so make sure you install the functions in the DB you are using)
The second thing the PostGIS contrib files installs is the template_postGIS database which is required for the geometry datatypes (geom as a data type won't exist until this is installed).
Rails 2.3.4
I have searched google, and have not found an answer to my dilemma.
For this discussion, I have two models. Users and Entries. Users can have many Entries (one for each day).
Entries have values and sent_at dates.
I want to query and display the average value of entries for a user BY DAY OF WEEK. So if a user has entered values for, say, the past 3 weeks, I want to show the average value for Sundays, Mondays, etc. In MySQL, it is simple:
SELECT DAYOFWEEK(sent_at) as day, AVG(value) as average FROM entries WHERE user_id = ? GROUP BY 1
That query will return between 0 and 7 records, depending upon how many days a user has had at least one entry.
I've looked at find_by_sql, but while I am searching Entry, I don't want to return an Entry object; instead, I need an array of up to 7 days and averages...
Also, I am concerned a bit about the performance of this, as we would like to load this to the user model when a user logs in, so that it can be displayed on their dashboard. Any advice/pointers are welcome. I am relatively new to Rails.
You can query the database directly, no need to use an actual ActiveRecord object. For example:
ActiveRecord::Base.connection.execute "SELECT DAYOFWEEK(sent_at) as day, AVG(value) as average FROM entries WHERE user_id = #{user.id} GROUP BY DAYOFWEEK(sent_at);"
This will give you a MySql::Result or MySql2::Result that you can then use each or all on this enumerable, to view your results.
As for caching, I would recommend using memcached, but any other rails caching strategy will work as well. The nice benefit of memcached is that you can have your cache expire after a certain amount of time. For example:
result = Rails.cache.fetch('user/#{user.id}/averages', :expires_in => 1.day) do
# Your sql query and results go here
end
This would put your results into memcached for one day under the key 'user//averages'. For example if you were user with id 10 your averages would be in memcached under 'user/10/average' and the next time you went to perform this query (within the same day) the cached version would be used instead of actually hitting the database.
Untested, but something like this should work:
#user.entries.select('DAYOFWEEK(sent_at) as day, AVG(value) as average').group('1').all
NOTE: When you use select to specify columns explicitly, the returned objects are read only. Rails can't reliably determine what columns can and can't be modified. In this case, you probably wouldn't try to modify the selected columns, but you can'd modify your sent_at or value columns through the resulting objects either.
Check out the ActiveRecord Querying Guide for a breakdown of what's going on here in a fairly newb-friendly format. Oh, and if that query doesn't work, please post back so others that may stumble across this can see that (and I can possibly update).
Since that won't work due to entries returning an array, we can try using join instead:
User.where(:user_id => params[:id]).joins(:entries).select('...').group('1').all
Again, I don't know if this will work. Usually you can specify where after joins, but I haven't seen select combined in there. A tricky bit here is that the select is probably going to eliminate returning any data about the user at all. It might make more sense just to eschew find_by_* methods in favor of writing a method in the Entry model that just calls your query with select_all (docs) and skips the association mapping.
I have been searching all over the web and I have no clue.
Suppose you have to build a dashboard in the admin area of your Rails app and you want to have the number of subscriptions per day.
Suppose that you are using SQLite3 for development, MySQL for production (pretty standard setup)
Basically, there are two options :
1) Retrieve all rows from the database using Subscriber.all and aggregate by day in the Rails app using the Enumerable.group_by :
#subscribers = Subscriber.all
#subscriptions_per_day = #subscribers.group_by { |s| s.created_at.beginning_of_day }
I think this is a really bad idea. Retrieving all rows from the database can be acceptable for a small application, but it will not scale at all. Database aggregate and date functions to the rescue !
2) Run a SQL query in the database using aggregate and date functions :
Subscriber.select('STRFTIME("%Y-%m-%d", created_at) AS day, COUNT(*) AS subscriptions').group('day')
Which will run in this SQL query :
SELECT STRFTIME("%Y-%m-%d", created_at) AS day, COUNT(*) AS subscriptions
FROM subscribers
GROUP BY day
Much better. Now aggregates are done in the database which is optimized for this kind of task, and only one row per day is returned from the database to the Rails app.
... but wait... now the app has to go live in my production env which uses MySQL !
Replace STRFTIME() with DATE_FORMAT().
What if tomorrow I switch to PostgreSQL ?
Replace DATE_FORMAT() with DATE_TRUNC().
I like to develop with SQLite. Simple and easy.
I also like the idea that Rails is database agnostic.
But why Rails doesn't provide a way to translate SQL functions that do the exact same thing, but have different syntax in each RDBMS (this difference is really stupid, but hey, it's too late to complain about it) ?
I can't believe that I find so few answers on the Web for such a basic feature of a Rails app : count the subscriptions per day, month or year.
Tell me I'm missing something :)
EDIT
It's been a few years since I posted this question.
Experience has shown that I should use the same DB for dev and prod. So I now consider the database agnostic requirement irrelevant.
Dev/prod parity FTW.
I ended up writing my own gem. Check it out and feel free to contribute:
https://github.com/lakim/sql_funk
It allows you to make calls like:
Subscriber.count_by("created_at", :group_by => "day")
You speak of some pretty difficult problems that Rails, unfortunately, completely overlooks. The ActiveRecord::Calculations docs are written like they're all you ever need, but databases can do much more advanced things. As Donal Fellows mentioned in his comment, the problem is much trickier than it seems.
I've developed a Rails application over the last two years that makes heavy use of aggregation, and I've tried a few different approaches to the problem. I unfortunately don't have the luxary of ignoring things like daylight savings because the statistics are "only trends". The calculations I generate are tested by my customers to exact specifications.
To expand upon the problem a bit, I think you'll find that your current solution of grouping by dates is inadequate. It seems like a natural option to use STRFTIME. The primary problem is that it doesn't let you group by arbitrary time periods. If you want to do aggregation by year, month, day, hour, and/or minute, STRFTIME will work fine. If not, you'll find yourself looking for another solution. Another huge problem is that of aggregation upon aggregation. Say, for example, you want to group by month, but you want to do it starting from the 15th of every month. How would you do it using STRFTIME? You'd have to group by each day, and then by month, but then someone account for the starting offset of the 15th day of each month. The final straw is that grouping by STRFTIME necessitates grouping by a string value, which you'll find very slow when performing aggregation upon aggregation.
The most performant and best designed solution I've come to is one based upon integer time periods. Here is an excerpt from one of my mysql queries:
SELECT
field1, field2, field3,
CEIL((UNIX_TIMESTAMP(CONVERT_TZ(date, '+0:00', ##session.time_zone)) + :begin_offset) / :time_interval) AS time_period
FROM
some_table
GROUP BY
time_period
In this case, :time_interval is the number of seconds in the grouping period (e.g. 86400 for daily) and :begin_offset is the number of seconds to offset the period start. The CONVERT_TZ() business accounts for the way mysql interprets dates. Mysql always assumes that the date field is in the mysql local time zone. But because I store times in UTC, I must convert it from UTC to the session time zone if I want the UNIX_TIMESTAMP() function to give me a correct response. The time period ends up being an integer that describes the number of time intervals since the start of unix time. This solution is much more flexible because it lets you group by arbitrary periods and doesn't require aggregation upon aggregation.
Now, to get to my real point. For a robust solution, I'd recommend that you consider not using Rails at all to generate these queries. The biggest issue is that the performance characteristics and subtleties of aggregation are different across the databases. You might find one design that works well in your development environment but not in production, or vice-versa. You'll jump through a lot of hoops to get Rails to play nicely with both databases in query construction.
Instead I'd recommend that you generate database-specific views in your chosen database and bring those along to the correct environment. Try to model the view as you would any other ActiveRecord table (id's and all), and of course make the fields in the view identical across databases. Because these statistics are read-only queries, you can use a model to back them and pretend like they're full-fledged tables. Just raise an exception if somebody tries to save, create, update, or destroy.
Not only will you get simplified model management by doing things the Rails way, you'll also find that you can write units tests for your aggregation features in ways you wouldn't dream of in pure SQL. And if you decide to switch databases, you'll have to rewrite those views, but your tests will tell you where you're wrong, and make life so much easier.
I just released a gem that allows you to do this easily with MySQL. https://github.com/ankane/groupdate
You should really try to run MySQL in development, too. Your development and production environments should be as close as possible - less of a chance for something to work on development and totally break production.
If db agnosticism is what you're after, I can think of a couple of options:
Create a new field (we'll call it day_str) for the Subscriber that stores either the formatted date or a timestamp and use ActiveRecord.count:
daily_subscriber_counts = Subscriber.count(:group => "day_str")
The trade-off is of course a slightly larger record size, but this would all but eliminate performance worries.
You could also, depending on how granular the data that's being visualized is, just call .count several times with the date set as desired...
((Date.today - 7)..Date.today).each |d|
daily_subscriber_counts[d] = Subscriber.count(:conditions => ["created_at >= ? AND created_at < ?", d.to_time, (d+1).to_time)
end
This could also be customized to account for varying granularities (per month, per year, per day, per hour). It's not the most efficient solution in the case that you wanted to group by day on all of your subscribers (haven't had a chance to run it either), but I would imagine you'd want to group by month, day, hour if you're viewing the a years worth, months worth or days worth of data respectively.
If you're willing to commit to mysql and sqlite you could use...
daily_subscriber_counts = Subscriber.count(:group => "date(created_at)")
...as they share similar date() functions.
I'd refine/expand PBaumann's answer slightly, and include a Dates table in your database. You'd need a join in your query:
SELECT D.DateText AS Day, COUNT(*) AS Subscriptions
FROM subscribers AS S
INNER JOIN Dates AS D ON S.created_at = D.Date
GROUP BY D.DateText
...but you'd have a nicely-formatted value available without calling any functions. With a PK on Dates.Date, you can merge join and it should be very fast.
If you have an international audience, you could use DateTextUS, DateTextGB, DateTextGer, etc., but obviously this would not be a perfect solution.
Another option: cast the date to text on the database side using CONVERT(), which is ANSI and may be available across databases; I'm too lazy to confirm that right now.
Here's how I do it:
I have a class Stat which allows storing raw events.
(Code is from the first few weeks I started coding in Ruby so excuse some of it :-))
class Stat < ActiveRecord::Base
belongs_to :statable, :polymorphic => true
attr_accessible :statable_id, :statable_type, :statable_stattype_id, :source_url, :referral_url, :temp_user_guid
# you can replace this with a cron job for better performance
# the reason I have it here is because I care about real-time stats
after_save :aggregate
def aggregate
aggregateinterval(1.hour)
#aggregateinterval(10.minutes)
end
# will aggregate an interval with the following properties:
# take t = 1.hour as an example
# it's 5:21 pm now, it will aggregate everything between 5 and 6
# and put them in the interval with start time 5:00 pm and 6:00 pm for today's date
# if you wish to create a cron job for this, you can specify the start time, and t
def aggregateinterval(t=1.hour)
aggregated_stat = AggregatedStat.where('start_time = ? and end_time = ? and statable_id = ? and statable_type = ? and statable_stattype_id = ?', Time.now.utc.floor(t), Time.now.utc.floor(t) + t, self.statable_id, self.statable_type, self.statable_stattype_id)
if (aggregated_stat.nil? || aggregated_stat.empty?)
aggregated_stat = AggregatedStat.new
else
aggregated_stat = aggregated_stat.first
end
aggregated_stat.statable_id = self.statable_id
aggregated_stat.statable_type = self.statable_type
aggregated_stat.statable_stattype_id = self.statable_stattype_id
aggregated_stat.start_time = Time.now.utc.floor(t)
aggregated_stat.end_time = Time.now.utc.floor(t) + t
# in minutes
aggregated_stat.interval_size = t / 60
if (!aggregated_stat.count)
aggregated_stat.count = 0
end
aggregated_stat.count = aggregated_stat.count + 1
aggregated_stat.save
end
end
And here's the AggregatedStat class:
class AggregatedStat < ActiveRecord::Base
belongs_to :statable, :polymorphic => true
attr_accessible :statable_id, :statable_type, :statable_stattype_id, :start_time, :end_time
Every statable item that gets added to the db has a statable_type and a statable_stattype_id and some other generic stat data. The statable_type and statable_stattype_id are for the polymorphic classes and can hold values like (the string) "User" and 1, which means you're storing stats about User number 1.
You can add more columns and have mappers in the code extract the right columns when you need them. Creating multiple tables make it harder to manage.
In the code above, StatableStattypes is just a table that contains "events" you'd like to log... I use a table because prior experience taught me that I don't want to look for what type of stats a number in the database refers to.
class StatableStattype < ActiveRecord::Base
attr_accessible :name, :description
has_many :stats
end
Now go to the classes you'd like to have some stats for and do the following:
class User < ActiveRecord::Base
# first line isn't too useful except for testing
has_many :stats, :as => :statable, :dependent => :destroy
has_many :aggregated_stats, :as => :statable, :dependent => :destroy
end
You can then query the aggregated stats for a certain User (or Location in the example below) with this code:
Location.first.aggregated_stats.where("start_time > ?", DateTime.now - 8.month)
I have two models, associated with a HABTM (actually using has_many :through on both ends, along with a join table). I need to retrieve all ModelAs that is associated with BOTH of two ModelBs. I do NOT want all ModelAs for ModelB_1 concatenated with all ModelAs for ModelB_2. I literally want all ModelAs that are associated with BOTH ModelB_1 and ModelB_2. It is not limited to only 2 ModelBs, it may be up to 50 ModelBs, so this must scale.
I can describe the problem using a variety of analogies, that I think better describes my problem than the previous paragraph:
* Find all books that were written by all 3 authors together.
* Find all movies that had the following 4 actors in them.
* Find all blog posts that belonged to BOTH the Rails and Ruby categories for each post.
* Find all users that had all 5 of the following tags: funny, thirsty, smart, thoughtful, and quick. (silly example!)
* Find all people that have worked in both San Francisco AND San Jose AND New York AND Paris in their lifetimes.
I've thought of a variety of ways to accomplish this, but they're grossly inefficient and very frowned upon.
Taking an analogy above, say the last one, you could do something like query for all the people in each city, then find items in each array that exist across each array. That's a minimum of 5 queries, all the data of those queries transfered back to the app, then the app has to intensively compare all 5 arrays to each other (loops galore!). That's nasty, right?
Another possible solution would be to chain the finds on top of each other, which would essentially do the same as above, but won't eliminate the multiple queries and processing. Also, how would you dynamicize the chain if you had user submitted checkboxes or values that could be as high as 50 options? Seems dirty. You'd need a loop. And again, that would intensify the search duration.
Obviously, if possible, we'd like to have the database perform this for us, so, people have suggested to me that I simply put multiple conditions in. Unfortunately, you can only do an OR with HABTM typically.
Another solution I've run across is to use a search engine, like sphinx or UltraSphinx. For my particular situation, I feel this is overkill, and I'd rather avoid it. I still feel there should be a solution that will let a user craft a query for an arbitrary number of ModelBs and find all ModelAs.
How would you solve this problem?
You may do this:
build a query from your ModelA, joining ModelB (through the join model), filtering the ModelBs that have one of the values that you are looking for, that is putting them in OR (i.e. where ModelB = 'ModelB_1' or ModelB = 'ModelB_2'). With this query the result set will have multiple 'ModelA' rows, exactly one row for each ModelB condition satisfied.
add a group by condition to the query on the ModelA columns you need (even all of them if you wish). The count() for each row is equal to the number of ModelB conditions satisfied*.
add a 'having' condition selecting only the rows whose count(*) is equal to the number of ModelB conditions you need to have satisfied
example:
model_bs_to_find = [100, 200]
ModelA.all( :joins=>{:model_a_to_b=>:model_bs},
:group=>"model_as.id",
:select=>"model_as.*",
:conditions=>["model_bs.id in (?)", model_bs_to_find],
:having=>"count(*)=#{model_bs_to_find.size}")
N.B. the group and select parameters specified in that way will work in MySQL, the standard SQL way to do so would be to put the whole list of model_as columns in both the group and select parameters.