How to make ORM (ActiveRecord) Models for union queries in Rails - sql

I have an application that has some basic entities
Posts
posts have:
Likes
Comments
and Ratings
I then have an SQL view to query for all three. With that I have a model called something like PostActivityView. A post has an activity view so I can call
#post.activity_view
which returns a collection of the appropriate values (from Likes, Comments, and Ratings). This all works correctly.
My issue is that this returns a collection of hashmaps, not Comments, Likes, and Ratings. This makes sense because my view is creating a new "with PostEvents as (...)" result. My question: is there a way to generalize these results and represent them with an ActiveRecord object?
Likes, Comments, and Ratings have different attributes so I do some aliasing in the view (comment's have comment.body for text and Ratings can have rating.comments for text so when needed I rename something like review.comments to .body). So my results all have the same attributes. It seems like I should be able to make an ActiveRecord object like PostEvent which just has the aliased columns. Is this possible?

I don't know how to do what you're describing. However ,do you really need to store them in separate tables? You could keep them all in a single table and use single table inheritance (http://api.rubyonrails.org/classes/ActiveRecord/Base.html#label-Single+table+inheritance) to have separate classes (Likes, Comments, or Ratings) for each type of thing a particular row represents. Then the common stuff could sit up in the parent class, and the stuff specific to the more granular things could go into the descendant classes.
It sounds like your situation is the opposite of that and you're combining separate tables into a single union. I suspect that'd be very difficult to implement in ActiveRecord itself as different databases have different rules for how and when the contents of a database view may be modified (i.e., if you could somehow create an AR class that referenced your view the way you're proposing, what would happen when you call save?)

It sounds like you've gone down the path of providing a view to make it convenient to retrieve all of these objects in one set as a single type of object, when your requirement is really to bring back different objects.
Based on that I'd question the use of the view at all. I'm not anti-view you understand -- we use them quite a lot for producing read-only reports in our application for performance reasons -- but if you need the rows to be returned as their proper object type then I'd retrieve them separately as Likes, Comments, and Ratings.

First solution would be to use the gem scenic and create an activity_views view by using a union query:
create view activity_views
as (
select ...
from likes
union
select ...
from comments
union
select ...
from rating
)
your data need to be homogenous of course.

Related

SQL - When using an ORM, does it sometimes make sense to not use a pivot table for many_to_many relationships?

For the following hypothetical use case, I'm trying to understand why it may be desirable to have a pivot table instead of an alternative solution (outlined below).
Hypothetical Use Case
Let’s say that a movie has many actors and that an actor can belong to more than one movie.
"Standard" Pivot Table Solution
As outlined in this lesson (using Elixir's Ecto library), the "standard" solution recommends using a movies_actors pivot table, and both the movies and actors tables reference this movies_actors table.
Alternative Solution
Instead, could we achieve the same result by having the concept of a list of ids?
actor belongs to one or more movies by having the actors table include a movie_ids field (which is a list)
movie has many actors by having the movies table include a actor_ids field (which is a list)
Question
Is one solution preferable? Why?
The table you are referring to is more typically called a "junction" table or "association" table. It is the standard way to implement many-to-many relationships.
Junction tables have some key advantages. Notably, it guarantees data integrity when foreign keys are properly defined.
But that is not your question. Are other representations appropriate under some circumstances? I would say that Postgres provides powerful functionality through arrays and JSON which make them feasible for many-to-many relationships. In particular, Postgres supports indexes on arrays and JSON, overcoming one of the big hurdles of such a relationship.
When would such a list be appropriate? I don't think it is appropriate for Actors. That is an entity in its own right and there is lots of additional information you want about an actor.
But it might be appropriate for something like user-generated tags, particularly tags where you don't feel a need to maintain a master list (and don't care about misspellings). It might be appropriate for alternative names for something (assuming you don't want disjoint names across rows).
I think you should not use the "alternative solution" of storing arrays of referenced ids to model an many-to-many relationship. It seems simpler at first glance, but it will hurt you later.
You should write a simple test case for both scenarios and create test tables with a realistic number of entries and relationships (it doesn't matter if the data are artificial and repeating). Then try to write a join between the two tables. You will find that with the "alternative solution", the query looks much more complicated (at best, it will involve strange operators like #>) and doesn't perform as well (you can only get a nested loop join).
There is a good reason to keep data in the first normal form – it is better adapted to the way relational databases process data.
Of course this "normal form" stuff has to be taken with a grain of salt: it is fine to use an array to store data, as long as you don't use individual array entries in your query processing. But by joining over array elements you certainly step over that line.

SQL vs NoSQL for data that will be presented to a user after multiple filters have been added

I am about to embark on a project for work that is very outside my normal scope of duties. As a SQL DBA, my initial inclination was to approach the project using a SQL database but the more I learn about NoSQL, the more I believe that it might be the better option. I was hoping that I could use this question to describe the project at a high level to get some feedback on the pros and cons of using each option.
The project is relatively straightforward. I have a set of objects that have various attributes. Some of these attributes are common to all objects whereas some are common only to a subset of the objects. What I am tasked with building is a service where the user chooses a series of filters that are based on the attributes of an object and then is returned a list of objects that matches all^ of the filters. When the user selects a filter, he or she may be filtering on a common or subset attribute but that is abstracted on the front end.
^ There is a chance, depending on user feedback, that the list of objects may match only some of the filters and the quality of the match will be displayed to the user through a score that indicates how many of the criteria were matched.
After watching this talk by Martin Folwler (http://www.youtube.com/watch?v=qI_g07C_Q5I), it would seem that a document-style NoSQL database should suit my needs but given that I have no experience with this approach, it is also possible that I am missing something obvious.
Some additional information - The database will initially have about 5,000 objects with each object containing 10 to 50 attributes but the number of objects will definitely grow over time and the number of attributes could grow depending on user feedback. In addition, I am hoping to have the ability to make rapid changes to the product as I get user feedback so flexibility is very important.
Any feedback would be very much appreciated and I would be happy to provide more information if I have left anything critical out of my discussion. Thanks.
This problem can be solved in by using two separate pieces of technology. The first is to use a relatively well designed database schema with a modern RDBMS. By modeling the application using the usual principles of normalization, you'll get really good response out of storage for individual CRUD statements.
Searching this schema, as you've surmised, is going to be a nightmare at scale. Don't do it. Instead look into using Solr/Lucene as your full text search engine. Solr's support for dynamic fields means you can add new properties to your documents/objects on the fly and immediately have the ability to search inside your data if you have designed your Solr schema correctly.
I'm not an expert in NoSQL, so I will not be advocating it. However, I have few points that can help you address your questions regarding the relational database structure.
First thing that I see right away is, you are talking about inheritance (at least conceptually). Your objects inherit from each-other, thus you have additional attributes for derived objects. Say you are adding a new type of object, first thing you need to do (conceptually) is to find a base/super (parent) object type for it, that has subset of the attributes and you are adding on top of them (extending base object type).
Once you get used to thinking like said above, next thing is about inheritance mapping patterns for relational databases. I'll steal terms from Martin Fowler to describe it here.
You can hold inheritance chain in the database by following one of the 3 ways:
1 - Single table inheritance: Whole inheritance chain is in one table. So, all new types of objects go into the same table.
Advantages: your search query has only one table to search, and it must be faster than a join for example.
Disadvantages: table grows faster than with option 2 for example; you have to add a type column that says what type of object is the row; some rows have empty columns because they belong to other types of objects.
2 - Concrete table inheritance: Separate table for each new type of object.
Advantages: if search affects only one type, you search only one table at a time; each table grows slower than in option 1 for example.
Disadvantages: you need to use union of queries if searching several types at the same time.
3 - Class table inheritance: One table for the base type object with its attributes only, additional tables with additional attributes for each child object type. So, child tables refer to the base table with PK/FK relations.
Advantages: all types are present in one table so easy to search all together using common attributes.
Disadvantages: base table grows fast because it contains part of child tables too; you need to use join to search all types of objects with all attributes.
Which one to choose?
It's a trade-off obviously. If you expect to have many types of objects added, I would go with Concrete table inheritance that gives reasonable query and scaling options. Class table inheritance seems to be not very friendly with fast queries and scalability. Single table inheritance seems to work with small number of types better.
Your call, my friend!
May as well make this an answer. I should comment that I'm not strong in NoSQL, so I tend to lean towards SQL.
I'd do this as a three table set. You will see it referred to as entity value pair logic on the web...it's a way of handling multiple dynamic attributes for items. Lets say you have a bunch of products and each one has a few attributes.
Prd 1 - a,b,c
Prd 2 - a,d,e,f
Prd 3 - a,b,d,g
Prd 4 - a,c,d,e,f
So here are 4 products and 6 attributes...same theory will work for hundreds of products and thousands of attributes. Standard way of holding this in one table requires the product info along with 6 columns to store the data (in this setup at least one third of them are null). New attribute added means altering the table to add another column to it and coming up with a script to populate existing or just leaving it null for all existing. Not the most fun, can be a head ache.
The alternative to this is a name value pair setup. You want a 'header' table to hold the common values amoungst your products (like name, or price...things that all rpoducts always have). In our example above, you will notice that attribute 'a' is being used on each record...this does mean attribute a can be a part of the header table as well. We'll call the key column here 'header_id'.
Second table is a reference table that is simply going to store the attributes that can be assigned to each product and assign an ID to it. We'll call the table attribute with atrr_id for a key. Rather straight forwards, each attribute above will be one row.
Quick example:
attr_id, attribute_name, notes
1,b, the length of time the product takes to install
2,c, spare part required
etc...
It's just a list of all of your attributes and what that attribute means. In the future, you will be adding a row to this table to open up a new attribute for each header.
Final table is a mapping table that actually holds the info. You will have your product id, the attribute id, and then the value. Normally called the detail table:
prd1, b, 5 mins
prd1, c, needs spare jack
prd2, d, 'misc text'
prd3, b, 15 mins
See how the data is stored as product key, value label, value? Any future product added can have any combination of any attributes stored in this table. Adding new attributes is adding a new line to the attribute table and then populating the details table as needed.
I beleive there is a wiki for it too... http://en.wikipedia.org/wiki/Entity-attribute-value_model
After this, it's simply figuring out the best methodology to pivot out your data (I'd recommend Postgres as an opensource db option here)

Delimited string of ids as a field or a separate table?

I have a database in which I store a large amount of user-created products.
The user can also create "views" which works as a container holding these products, lets call them categories.
It would be simple if each product had a category-field containing the id of the category, however each product can be added to multiple categories so that doesn't work.
Presently I solve this by having a string-field "products" in the category-table which is a comma-separated list of product-ids.
What I'm wondering is basically if it's "okay" to do it this way? Is it generally accepted? Will it cause some kind of problem I'm not realizing?
Would it be better to create another table named something like productsInCategories which has 2 fields, one with a category-id and one with product-id and link them together this way?
Will one of these methods perform better or be better in some other way?
I'm using sqlce at the moment if that matters, but that will most likely change soon.
I would go for the second option: a separate table.
Makes it easier to handle if you need to query from the product perspective. Also the join to the categories will be simple and fast. This is exactly what relational databases are made for.
Imagine a simple query like what categories a product is in. With your solution you need to check all categories one by one, parse the csv-list of each category to find the products. With a separate table it is one clean query.

SQL call in Rails, return object that isn't defined by a model

I know how to call SQL to select data from already defined Models, like so:
Friend.find_by_sql(["...."])
My question is what do I do if I need information that isn't defined by a model?
I have a Meal table, Friend table, and Component table. Components make up Meals, and Friends can be allergic to Components. I have a SQL command (which I can't post here due to confidentiality reasons, but the implementation isn't that relevant anyway) that returns a friend_id and component_id, given a Meal. That is, it returns a list of rows (with two columns each, friend_id and component_id) telling me which Friends are allergic to which Components in a given Meal. But I don't know how to store this in a variable in Ruby and access that information.
To give some pseudocode to give you an idea of what I want to do:
#allergies_for_a_meal = ....<INSERT SQL QUERY HERE>...
#friends_who_are_allergic = Friends.find_by_id(#allergies_for_a_meal.friend_id)
Can someone give me an idea of the proper syntax for this?
If you're inside an ActiveRecord::Base subclass (such as a model) then you'll have access to the current database connection through connection; the connection has useful methods like select_rows:
a = connection.select_rows('select id from ...').map { |r| r[0].to_i }
If you're not inside a model class then you really shouldn't be directly messing with the database but you can use ActiveRecord::Base.connection if you must.
You can create a SQL View in your database, then create a read-only ActiveRecord model to read data from it.
That way:
your complex sql is stored in the dbms (and parsed only once).
you can use all of the usual ActiveRecord accessors and finders to access the data
you can add additional Find constraints to the View's SQL
Note that some Views can be written to, depending on the View's SQL, DBMS, engine, etc. But I recommend against that.

SQL: Best practice to store various fields in one table

I want to design a table for items.
There are many types of items, all share several fields.
Each type of item has it's own fields.
I want to store the uncommon fields in a separate table.
I thought of something like :
----Items
+Item_id
+Item_Type_Id
+Item_Serial
...
----Item_types
+Item_Type_Id
+Item_Name
...
----Item_Fields
+Item_Field_Id
+Item_Type_Id
+Field_Name
...
----Field_Values
+Field_Value_Id
+Item_Field_Id
+Item_Id
+Value
...
The pro is having the ability to add fields and values without changing the tables.
The con is that i have to transpose the field names and values in order to see all info for an item.
Any better suggestions? Or perhaps a simple (not stored procedure) way to join the tables to get a flat info?
I tried to use PIVOT (I'm using SQL 2005) but with no luck.
Thanks.
I wrote a stored proc to make PIVOT more useful. Here is the source:
http://dot-dash-dot.com/files/pivot_query.sql
and some examples how to use it:
http://dot-dash-dot.com/files/pivot_query_examples.sql
For your data, the query would just be the raw data joining those tables above to produce a raw listing of:
set #myQuery = '
Select Item_Id, Item_Name, Field_Name, Value From ...
';
Then your call to pivot_query would be:
exec pivot_query #myQuery, 'Item_Id, Item_Name', 'Field_Name', 'max(Value)'
like that.
One other option is to store items in XML format in one single field. Depending on your usage scenario, it may work well. Or it may not.
I believe there has to be some grouping of values.
For example lets say your items are objects in a room. Then different types of objects have different attributes. For example books have publication date and number of pages, chairs have color pattern and height, etc.
In this example, you make an item table, a book table and a chair table.
You could make an "additional values" table that holds generic information as above, but what you really want to do is figure out the "types" of the different groups of attributes and then make every one of those types it's own table.
Is there a set of values that all items have? There has to be at least one which is a type field (this describes where the other information is stored. I expect every item will also have a name and a description. This is the information to go in the item table.
Then you make additional tables for the different types itembook, itemchair etc. There may even be some overlap. For example itembook, itemhardback, itempaperback would be 3 tables used to describe books.
I believe this is the best solution to your problem. It will still allow you to extend, but it does put a framework around your data.
Of course there are systems that do it the way you describe, but unless you are building a tool that others are going to reuse for many different projects, it makes sense to design the system for the task at hand. You end up falling into the over designing trap otherwise. (IMHO)
On the other hand, if you are going to go the totally generic direction I suggest you use one of the systems that already exist that work in this way (entity framework, app framework, etc) Use someone else's don't start from scratch.
I'm not too sure how you want to retrieve the info, but something like the below may work. (It's probably close to what Hogan mentioned.)
If you want to retrieve data for a type, you can just JOIN two tables.
If you want to retrieve data for all types (with all fields), you can LEFT JOIN all tables.
----Items
+Item_id
+Item_Type_Id
+Item_Common_Field1
+Item_Common_Field1
...
----Item_Type_A
+Item_id
+Item_Type_A_Specific_Field1
+Item_Type_A_Specific_Field2
...
----Item_Type_B
+Item_id
+Item_Type_B_Specific_Field1
...
If you add these columns to the table, you can make them sparse columns to avoid the space taken by unspecified uncommon fields.
But I would not call this a best practice. (see comments under your question)
I don't want to be accused of being the always-uses-the-latest-useless-technology guy, but depending on your use case, this might be a good case for a nosql database - Tokyo, Mongo, SimpleDB, etc. Or as Developer Art suggested, you could just serialize the different fields into a single column. It's not the worst thing in the world.