Is it safe to use instance variables to store transient properties of ActiveRecord objects? - ruby-on-rails-3

I need to store an encoding-state value on a video while it's been encoded
I have a video object. While the video is being encoded it needs to lock edits on its comments.
The video therefore needs to store its current encoding state (is it happening yes/no?) and allow child comments to query that property.
Please note
I know that there are better ways to solve this particular problem. I actually need to solve a slightly different problem however I felt the nuances of it would confuse the question so I've chosen this one instead. My question is specifically around the nuances of isntance variables and not how to better-solve this encoding problem (which obviously needs a queue).
class Video
has_many :comments
after_initialize do
#encoding_in_process = false
end
def encode
#encoding_in_process = true
...
#encoding_in_process = false
end
def encoding_in_process?
#encoding_in_process
end
end
class Comment
belongs_to :video
before_update
raise "locked" if video.encoding_in_process?
end
...
end
As you can see, each video instance is storing an instance variable #encoding_in_process which is used to determine whether a comment can be updated.
The problem
There is a danger there will be multiple in-memory instances of the same video each with different values for #encoding_in_process.
e.g.
bieber_video = Video.find_all_by_artist('Bieber').last
bieber_video.encode
# assume this takes a while...
bieber_video.encoding_in_process?
# => true
bieber_copy = Video.find_by_id bieber_video.id
bieber_copy.encoding_in_process?
# => false
# Each ActiveRecord objects refer to the same Bieber video
bieber_copy.id == bieber_video.id
# => true
# ...however they refer to different objects in memory:
puts bieber_video
#<Video:0x00000105a9e948>
puts bieber_copy
#<Video:0x00000105a11111>
# and hence each instance has a different version of commenting_locked?
# bieber_video.encoding_in_process? != bieber_copy.encoding_in_process?
The question
Given that the same database row might generate two different in-memory instances, what is a safe way to store transient non-database-backed information about those instances?
EDIT
The actual problem I'm trying to solve is setting a flag on an object when destroy is initiated such that its child objects can determine whether or not they're eligible to be destroyed themselves.
It's therefore a very instantaneous problem and not suitable for backing into the database. I used this video example because I thought it was a bit clearer however I may have simply muddied the waters.
THE SOLUTION (courtesy of one of the answers below
#Alex D's suggestion did solve the problem but to add further clarity to this for anyone wanting to repeat, the actual code was this:
class Video
# set a class variable containing an array of all videos
# which are currently being encoded
##ids_of_videos_being_encoded = []
...
def encode
store_encoding_state true
begin
encode()
ensure
# make sure we switch this off after
# encoding finishes or fails
store_encoding_state false
end
end
private
def store_encoding_state encoding_in_progress
if encoding_in_progress
##ids_of_videos_being_encoded.push(id)
else
##ids_of_videos_being_encoded.delete(id)
end
end
def encoding_initiated?
##ids_of_videos_being_encoded.include? id
end
end

The answer to your question depends on whether you may use multiple server processes or not. If you may want to run multiple server processes (which is a good assumption), the problem is not just multiple in-memory ActiveRecord objects representing the same DB row, the problem is multiple objects in different memory spaces.
If you have multiple processes which are somehow collaboratively working with the same data, you must keep that data in a shared store (i.e. a database), and you must flush changes to the store, and refresh your in-memory data as needed. In this case, you cannot rely on transient in-memory data being kept in synchronization (because there is no way it possibly could be).
If constantly writing/reading your transient data to the DB sounds expensive, that's because it is. In general, whenever you have multiple processes (on the same or different servers) working together, you want to design things so each process can grab a chunk of data and work on it for a while without having to communicate with the others. Fine-grained data sharing in a distributed system = bad performance.
If you are sure that you will only ever use a single server process, and you want to simulate the effect of instance variables which are shared between multiple ActiveRecord objects representing the same DB row, keep the data in a hash, keyed by the record ID, and use getters/setters which read/write the hash. If you are doing a lot of this, you can do some metaprogramming "magic" to have the getters/setters automatically generated (a la "attr_accessor"). If you need help writing that metaprogramming code, post a question and I'll answer it.

The video therefore needs to store its current encoding state (is it happening yes/no?) and allow child comments to query that property.
IMO, that's not a good way to do this because of all the synchronization issues that will ensue.
A much better strategy is to start off all videos in an unencoded state, which you store with the video record's metadata. When a video data stream is created, enqueue an encoding task for some worker to carry out. The worker thread will encode the videos, and when it's done, it should update the video's state to encoded.
Now there's no transient state issues; the next time someone tries to comment when the encoding is finished, it'll be done.
Given that the same database row might generate two different in-memory instances, what is a safe way to store transient non-database-backed information about those instances?
If they don't need to be synchronized, then there isn't an issue. If they do need to be synchronized, you run the risk of a race condition. You can also call .reload to refresh an object's state from the database.
And if the data needs to be synchronized like that, then you probably do need to store it. In the video encoding example, you should either store each video's encoded/unencoded state or provide an implicit, authoritative way of knowing whether the video is encoded or not.
Update from the original question:
The actual problem I'm trying to solve is setting a flag on an object when destroy is initiated such that its child objects can determine whether or not they're eligible to be destroyed themselves.
Just use the after_destroy callback to invoke an appropriate method on each child object, and let them determine whether they should be destroyed or not. That will look something like this:
class Video < ActiveRecord::Base
after_destroy :purge_pending_comments!
def purge_pending_comments!
comments.map &:destroy_if_pending
end
end

Related

Memory Architecture of HPX

HPX supports Active Global Address Space.Over a long period of time, I can't able to figure out what "AGAS" really is ? By doing some research with HPX-5 supported memory models. What I can able to see is in "AGAS: memory can be moved to other localities in order to balance the system" but in "PGAS" it is not able to do so. But in hpx we still create remote objects (components) with the parameter where to create it(Global identifiers of localities). But using HPX in desktop really hides this feature and also running HPX in rostam I can't able to differentiate it with "PGAS" memory system. Could you please help me to understand this black magic feature of HPX ?
You can think of AGAS as being a distributed key/value 'in memory' database.
When you create an object locally, you get a pointer of the standard variety that is referable using *this or this-> to get access to the internals.
However, you cannot pass a this pointer from one node to another and use it arbitrarily.
When you create an hpx:: component or register an object that you created with AGAS, it essentially stores the this pointer in the database and gives you an hpx::id_type as a handle (key). This id can be used in function calls on local or remote nodes as a way of referencing the object.
If you move the object from one node to another (using an agas function), then AGAS will update it's internal value to reflect the fact that the this pointer has changed value (it will internally have its destructor called locally and invoke a move of contents into a new one constructed elsewhere) and is located on another node, but the key - the id_type that you have for that object is still valid - note that this is only true if AGAS is doing the relocation - if you just create a copy elsewhere and delete a local object, it's not the same.
On a PGAS system, generally speaking, all the nodes share a block of memory with each other, and each node can 'access' memory/data/objects on the other node, by indexing into this shared memory area. So in PGAS, the addresses of items on other nodes are 'fixed' in the sense that data on node 1 is at shared_region + offset*1, data on node 2 is at + offset*2 and so on. This is a slight simplification, but you get the idea.
In HPX, objects are free to float about and you can reference them via the id_types and let AGAS handle the 'real' address lookups. That is why the 'Active' is in AGAS, as opposed to PGAS.
In this way data items (components) can be relocated from one place to another, but the handles that refer to them can be immutable.
In this sense the 'Address Space' part of AGAS is saying that hpx::id_type's can be thought of as addresses that span all the nodes in the job.

Design patterns on initializing an object?

What's the recommended way to handle an object that may not be fully initialized?
e.g. taking the following code (off the top of my head in ruby):
class News
attr_accessor :number
def initialize(site)
#site = site
end
def setup(number)
#number = number
end
def list
puts news_items(#site, #number)
end
end
Clearly if I do something like:
news = News.new("siteA")
news.list
I'm going to run into problems. I'd need to do news.setup(3) before news.list.
But, are there any design patterns around this that I should be aware of?
Should I be creating default values? Or using fixed numbers of arguments to ensure objects are correctly initialized?
Or am I simply worrying too much about the small stuff here.
Should I be creating default values?
Does it make sense to set a default? If so this is a perfectly valid approach IMHO
Or using fixed numbers of arguments to ensure objects are correctly initialized?
You should ensure that your objects cannot be constructed in an invalid state, this will make your's and other users of your code much simpler.
in your example not initializing number in some way is a problem, and this method is an example of temporal coupling. You should avoid this, and the two ways you suggested are ways to do this. Alternatively you can have another object or static method responsible for building your object in a valid state instead
If you do have an object which in not fully initialised then any invalid methods should produce appropriate and descriptive exceptions which let the users know that they are using the code incorrectly, and gives examples of the correct usage patterns.
In c# InvalidStateException is usually appropriate and similar exceptions exist in Java. Ruby is beyond my pay grade unfortunately :)

Stateful objects, properties and parameter-less methods in favour of stateless objects, parameters and return values

I find this class definition a bit odd:
http://www.extremeoptimization.com/Documentation/Reference/Extreme.Mathematics.LinearAlgebra.SingleLeastSquaresSolver_Members.aspx
The Solve method does have a return value but would not need to because the result is also available in the Solution property.
This is what I see as traditional code:
var sqrt2 = Math.Sqrt(2)
This would be an alternative in the same spirit as the solver in the link:
var sqrtCalculator = new SqrtCalculator();
sqrtCalculator.Parameter = 2;
sqrtCalculator.Run();
var sqrt2 = sqrtCalculator.Result;
What are the pros and cons besides the second version being a bit "untraditional"?
Yes, the compiler won't help the user who forgot to assign some property (parameter) BUT this is the case with all components that contain writeable properties and don't have mandatory values in the constructor.
Yes, threading will not work, BUT each thread can create its own solver.
Yes, the garbage collector won't be able to dispose the solver's result, BUT if the entire solver is disposed it will.
Yes, compilers and processors have special treatment of parameters and return values which makes them fast, BUT the time for parameter handling is mostly neglectable.
And so on. Other ideas?
Well, after a year I found a clear flaw with this "introvert" approach. I am using an existing filter object which should operate on a measurement object but rather operates on itself in a "it's all me and nothing else"-fashion described above. Now the customer wants a recalculation of a measurement object a few minutes after the first calculation, and meanwhile the filter has processed other measurement objects. If it had been stateless and stored its data in the measurement object, it would have been an easy matter to implement a Recalculate method. The only way to solve the problem with an introvert filter is to let a filter instance be a part of the measurement object. Then filters need to be instantiated for every new measurement object. And since filters are a part of a chain the entire chain needs to be recreated. Well, there is some merit to being stateless.

Overextending object design by adding many trivial fields?

I have to add a bunch of trivial or seldom used attributes to an object in my business model.
So, imagine class Foo which has a bunch of standard information such as Price, Color, Weight, Length. Now, I need to add a bunch of attributes to Foo that are rarely deviating from the norm and rarely used (in the scope of the entire domain). So, Foo.DisplayWhenConditionIsX is true for 95% of instances; likewise, Foo.ShowPriceWhenConditionIsY is almost always true, and Foo.PriceWhenViewedByZ has the same value as Foo.Price most of the time.
It just smells wrong to me to add a dozen fields like this to both my class and database table. However, I don't know that wrapping these new fields into their own FooDisplayAttributes class makes sense. That feels like adding complexity to my DAL and BLL for little gain other than a smaller object. Any recommendations?
Try setting up a separate storage class/struct for the rarely used fields and hold it as a single field, say "rarelyUsedFields" (for example, it will be a pointer in C++ and a reference in Java - you don't mention your language.)
Have setters/getters for these fields on your class. Setters will check if the value is not the same as default and lazily initialize rarelyUsedFields, then set the respective field value (say, rarelyUsedFields.DisplayWhenConditionIsX = false). Getters they will read the rarelyUsedFields value and return default values (true for DisplayWhenConditionIsX and so on) if it is NULL, otherwise return rarelyUsedFields.DisplayWhenConditionIsX.
This approach is used quite often, see WebKit's Node.h as an example (and its focused() method.)
Abstraction makes your question a bit hard to understand, but I would suggest using custom getters such as Foo.getPrice() and Foo.getSpecialPrice().
The first one would simply return the attribute, while the second would perform operations on it first.
This is only possible if there is a way to calculate the "seldom used version" from the original attribute value, but in most common cases this would be possible, providing you can access data from another object storing parameters, such as FooShop.getCurrentDiscount().
The problem I see is more about the Foo object having side effects.
In your example, I see two features : display and price.
I would build one or many Displayer (who knows how to display) and make the price a component object, with a list of internal price modificators.
Note all this is relevant only if your Foo objects are called by numerous clients.

serialized object not being converted

I have a Model called statistics which has a value field that contains Goals (a self defined class) data
class Statistic < ActiveRecord::Base
serialize :value
end
When I try to access the goals_against (an atr_reader of the Goals class) I get
undefined method `goals_against' for #<String:0x54f8400>
The value property contains following data:
--- !ruby/object:Goals \ngoals: {}\n\ngoals_against: 1\ngoals_for: 0\nversion: 1\n
In string format according to the debugger.
It seems that rails doesn't know this data is of type Goals.
Someone knows how to solve this?
Thanks
Three things:
First, where ever your Goal class is defined, make sure it is loaded. At some point Rails stopped auto-loading stuff in the lib folder. So where ever your extra classes are located, set them in config.autoload_paths (in config/application.rb).
Second, when you declare a column as serialized, you have the option of specifying the class. This is especially useful when you are working with a custom class and you want to make sure Rails does the conversion correctly.
serialize :value, Goal
Third, when you have a column that is serialized, make sure you have enough room for it. In other words, most of the time you're going to want that column to be "text" and not "string" in your schema (otherwise your sql engine will silently truncate anything too large to fit in a string column and you'll end up saving a broken object).