Whenever I want to avoid NxN queries, I use includes in Rails. But sometimes the included table might have large number of fields.
So to avoid performance issues what I do is, I do a join and I select only required fields from the associated table.
But this does not feel right at all. I feels like violating something although Im not sure what.
I understand that when we do includes, we cannot specify the select clause. How do we achieve this without doing a join and polluting(?) the objects.
There is no "nice" rails way to do this.
Like you said it isn't possible and also not usefull to combine includes and select because ActiveRecord::Base wraps every query to an object. This whould mean that loading only id and name and leaving out description and user_id for example whould bring you in a situation where you might lose the content of description and user_id when saving the incomplete object to the database.
So you can only do this using plain sql and leave out the object mapping of ActiveRecord so you don't get in danger of losing data. You can also do thisthrough the database connection established by ActiveRecord like this:
ActiveRecord::Base.connection.execute("SELECT id,name FROM table_name WHERE <conditions>").each do |row|
row["id"] #access id of each result
end
ActiveRecord::Base.connection contains a handle for the connection established by using the environment access data provided in your config/database.yml so you can use the same database connection model independent.
Related
I'm developing a Rails application where I use single table inheritance. There's an ActiveRecord model, User, which has a subclass Attendee. The latter has two subclasses, Host and Guest. Let's say that I'd like to find the attendees, both guests and hosts. The query would be as simple as this:
Attendee.all
However, this is not retrieving any result. Taking a look at the SQL generated:
SELECT "attendees".* FROM "attendees" WHERE "attendees"."type" IN ('Attendee')
It doesn't find any result, because all the records are stored using either Guest or Host classes, so there won't be any record with the type Attendee.
Right after, I look for all the guests using Guest.all, and it does retrieve records. If I look for attendees again, the SQL query changes to:
SELECT "attendees".* FROM "attendees" WHERE "attendees"."type" IN ('Attendee', 'Guest')
So it seems that Rails doesn't mind about subclasses until you use them. Do you know the reason of this behavior and how to correct it?
I have an application that has some basic entities
Posts
posts have:
Likes
Comments
and Ratings
I then have an SQL view to query for all three. With that I have a model called something like PostActivityView. A post has an activity view so I can call
#post.activity_view
which returns a collection of the appropriate values (from Likes, Comments, and Ratings). This all works correctly.
My issue is that this returns a collection of hashmaps, not Comments, Likes, and Ratings. This makes sense because my view is creating a new "with PostEvents as (...)" result. My question: is there a way to generalize these results and represent them with an ActiveRecord object?
Likes, Comments, and Ratings have different attributes so I do some aliasing in the view (comment's have comment.body for text and Ratings can have rating.comments for text so when needed I rename something like review.comments to .body). So my results all have the same attributes. It seems like I should be able to make an ActiveRecord object like PostEvent which just has the aliased columns. Is this possible?
I don't know how to do what you're describing. However ,do you really need to store them in separate tables? You could keep them all in a single table and use single table inheritance (http://api.rubyonrails.org/classes/ActiveRecord/Base.html#label-Single+table+inheritance) to have separate classes (Likes, Comments, or Ratings) for each type of thing a particular row represents. Then the common stuff could sit up in the parent class, and the stuff specific to the more granular things could go into the descendant classes.
It sounds like your situation is the opposite of that and you're combining separate tables into a single union. I suspect that'd be very difficult to implement in ActiveRecord itself as different databases have different rules for how and when the contents of a database view may be modified (i.e., if you could somehow create an AR class that referenced your view the way you're proposing, what would happen when you call save?)
It sounds like you've gone down the path of providing a view to make it convenient to retrieve all of these objects in one set as a single type of object, when your requirement is really to bring back different objects.
Based on that I'd question the use of the view at all. I'm not anti-view you understand -- we use them quite a lot for producing read-only reports in our application for performance reasons -- but if you need the rows to be returned as their proper object type then I'd retrieve them separately as Likes, Comments, and Ratings.
First solution would be to use the gem scenic and create an activity_views view by using a union query:
create view activity_views
as (
select ...
from likes
union
select ...
from comments
union
select ...
from rating
)
your data need to be homogenous of course.
I know how to call SQL to select data from already defined Models, like so:
Friend.find_by_sql(["...."])
My question is what do I do if I need information that isn't defined by a model?
I have a Meal table, Friend table, and Component table. Components make up Meals, and Friends can be allergic to Components. I have a SQL command (which I can't post here due to confidentiality reasons, but the implementation isn't that relevant anyway) that returns a friend_id and component_id, given a Meal. That is, it returns a list of rows (with two columns each, friend_id and component_id) telling me which Friends are allergic to which Components in a given Meal. But I don't know how to store this in a variable in Ruby and access that information.
To give some pseudocode to give you an idea of what I want to do:
#allergies_for_a_meal = ....<INSERT SQL QUERY HERE>...
#friends_who_are_allergic = Friends.find_by_id(#allergies_for_a_meal.friend_id)
Can someone give me an idea of the proper syntax for this?
If you're inside an ActiveRecord::Base subclass (such as a model) then you'll have access to the current database connection through connection; the connection has useful methods like select_rows:
a = connection.select_rows('select id from ...').map { |r| r[0].to_i }
If you're not inside a model class then you really shouldn't be directly messing with the database but you can use ActiveRecord::Base.connection if you must.
You can create a SQL View in your database, then create a read-only ActiveRecord model to read data from it.
That way:
your complex sql is stored in the dbms (and parsed only once).
you can use all of the usual ActiveRecord accessors and finders to access the data
you can add additional Find constraints to the View's SQL
Note that some Views can be written to, depending on the View's SQL, DBMS, engine, etc. But I recommend against that.
Is eager fetch same as join fetch?
I mean whether eagerly fetching a has-many relation fires 2 queries or a single join query?
How does rails active record implement a join fetch of associations as it doesnt know the table's meta-data in first hand (I mean columns in the table)? Say for example i have
people - id, name
things - id, person_id, name
person has one-to-many relation with the things. So how does it generate the query with all the column aliases even though it cannot know it when i do a join fetch on people?
An answer hasn't been accepted so I will try to answer your questions as I understand them:
"how does it know all the fields available in a table?"
It does a SQL query for every class that inherits from ActiveRecord::Base. If the class is 'Dog', it will do a query to find the column names of the table 'dogs'. In production mode it should only do this query once per run of the server -- in development mode it does it a lot. The query will differ depending on the database you use, and it is usually an expensive query.
"Say if i have a same name for column in a table and in an associated table how does it resolve this?"
If you are doing a join, it generates sql using the table names as prefixes to avoid ambiguities. In fact, if you are doing a join in Rails and want to add a condition (using custom SQL) for name, but both the main table and join table have a name column, you need to specify the table name in your sql. (e.g. Human.join(:pets).where("humans.name = 'John'"))
"I mean whether eagerly fetching a has-many relation fires 2 queries or a single join query?"
Different Rails versions are different. I think that early versions did a single join query at all times. Later versions would sometimes do multiple queries and sometimes a single join query, based on the realization that a single join query isn't always as performant as multiple queries. I'm not sure of the exact logic that it uses to decide. Recently, in Rails 3, I am seeing multiple queries happening in my current codebase -- but maybe it sometimes does a join as well, I'm not sure.
It knows the columns through a type of reflection. Ruby is very flexible and allows you to build functionality that will be used/defined during runtime and doesn't need to be stated ahead of time. It learns the associated "person_id" column by interpreting the "belongs_to :person" and knowing that "person_id" is the field that would be associated and the table would be called "people".
If you do People.includes(:things) then it will generate 2 queries, 1 that gets the people and a second that gets the things that have a relation to the people that exist.
http://guides.rubyonrails.org/active_record_querying.html
My application has one table called 'events' and each event has approx 30 standard fields, but also user defined fields that could be any name or type, in an 'eventdata' table. Users can define these event data tables, by specifying x number of fields (either text/double/datetime/boolean) and the names of these fields. This 'eventdata' (table) can be different for each 'event'.
My current approach is to create a lookup table for the definitions. So if i need to query all 'event' and 'eventdata' per record, i do so in a M-D relaitionship using two queries (i.e. select * from events, then for each record in 'events', select * from 'some table').
Is there a better approach to doing this? I have implemented this so far, but most of my queries require two distinct calls to the DB - i cannot simply join my master 'events' table with different 'eventdata' tables for each record in in 'events'.
I guess my main question is: can i join my master table with different detail tables for each record?
E.g.
SELECT E.*, E.Tablename
FROM events E
LEFT JOIN 'E.tablename' T ON E._ID = T.ID
If not, is there a better way to design my database considering i have no idea on how many user defined fields there may be and what type they will be.
There are four ways of handling this.
Add several additional fields named "Custom1", "Custom2", "Custom3", etc. These should have a datatype of varchar(?) or similiar
Add a field to hold the unstructured data (like an XML column).
Create a table of name /value pairs which are associated with some type of template. Let them manage the template. You'll have to use pivot tables or similiar to get the data out.
Use a database like MongoDB or another NoSql style product to store this.
The above said, The first one has the advantage of being fast but limits the number of custom fields to the number you defined. Older main frame type applications work this way. SalesForce CRM used to.
The second option means that each record can have it's own custom fields. However, depending on your database there are definite challenges here. Tried this, don't recommend it.
The third one is generally harder to code for but allows for extreme flexibility. SalesForce and other applications have gone this route; including a couple I'm responsible for. The downside is that Microsoft apparently acquired a patent on doing things this way and is in the process of suing a few companies over it. Personally, I think that's bullcrap; but whatever. Point is, use at your own risk.
The fourth option is interesting. We've played with it a bit and the performance is great while coding is pretty darn simple. This might be your best bet for the unstructured data.
Those type of joins won't work because you will need to pivot the eventdata table to make it columns instead of rows. Therefore it depends on which database technology you are using.
Here is an example with MySQL: How to pivot a MySQL entity-attribute-value schema
My approach would be to avoid using a different table for each event, if that's possible.
I would use something like:
Event (EventId, ..., ...)
EventColumnType (EventColumnTypeId, EventTypeId, ColumnName)
EventColumnData (EventColumnTypeId, Data)
You are them limited to the type of data you can store (everything would have to be strings, for example), but you the number of events and columns are unrestricted.
What I'm getting from your description is you have an event table, and then a separate EventData table for each and every event.
Rather than that, why not have a single EventCustomFields table that contains a foreign key to the event table, a field Name (event+field being the PK) and a field value.
Sure it's not the best. You'd be stuck serializing the value or storing everything as a string. And you'd still be stuck doing two queries, one for the event table and one to get it's custom fields, but at least you wouldn't have a new table for every event in the system (yuck x10)
Another, (arguably worse) option is to serialize the custom fields into a single column of the and then deserialize when you need. So your query would be something like
Select E.*, C.*
From events E, customFields C
Where E.ID = C.ID
Is it possible to just impose a limit on your users? I know the tables underneath Sharepoint 2007 had a bunch of columns for custom data that were just named like CustomString1, CustomDate2, etc. That may end up easier than some of the approaches above, where everything is in one column (though that's an approach I've taken as well), and I would think it would scale up better.
The answer to your main question is: no. You can't have different rows in the result set with different columns. The result set is kind of like a table, so each row has to have the same columns. You can fake it with padding and dummy columns, but that's probably not much better.
You could try defining a fixed event data table, with (say) ten of each type of column. Then you'd store the usage metadata in a separate table and just read that in at system startup. The metadata would tell you that event type "foo" has a field "name" mapped to column string0 in the event data table, a field named "reporter" mapped to column string1, and a field named "reportDate" mapped to column date0. It's ugly and wastes space, but it's reasonably flexible. If you're in charge of the database, you can even define a view on the table so to the client it looks like a "normal" table. If the clients create their own tables and just stick the table name in the event record, then obviously this won't fly.
If you're really hardcore you can write a database procedure to query the table structures and serialize everything to a lilst of key/type/value tuples and return that in one long string as the last column, but that's probably not much handier than what you're doing now.