Making sense of OOP in Lua - oop

I do most of my programming in Python, and I use OOP practices for most of my projects. I recently started taking a look at the Love2D game library and engine. I managed to get some things configured, and then thought about making a GameObject class. But, what's this? Lua doesn't have classes! It has tables, and metatables, and other such things. I'm having a lot of trouble making heads or tails of this even after reading the documentation several times over.
Consider the following example:
catClass = {}
catClass.__index = catClass
catClass.type = "Cat"
function catClass.create(name)
local obj = setmetatable({}, catClass)
obj.name = name
return obj
end
cat1 = catClass.create("Fluffy")
print(cat1.type)
cat2 = catClass.create("Meowth")
cat1.type = "Dog"
print(cat1.type)
print(cat2.type)
print(catClass.type)
The output of this is as follows:
Cat
Dog
Cat
Cat
What I do not understand is why changing cat1.type to "Dog" does not cause identical changes in cat2 and catClass. Does setting a metatable create a copy of the table? Google did not provide useful results (there are very few good Lua explanations).

When you index a table and a key does not exist then Lua will look to see if a metatable exists for the table. If one does then it will use the __index key of that metamethod to re-index your key.
When you created cat1 it inherited the catClass metatable. Then when you indexed type it will see that cat1 does not have a table entry called type and thus looks to the metatable to find it.
Then you set type on cat1 to Dog, which only sets the table key of cat1 itself, not the metatable. That is why when you index cat1 again for type you get Dog and not Cat.
If you go to http://www.lua.org/ there is documentation and some older copies of the Programming in Lua, written by the authors of Lua itself.

See the setmetatable documentation - the table returned is the table specified in the first argument.
This is a different table for each invocation of create (thanks to {}) and each new table is also distinct from the metatable used. No copy was made, but rather a new table was created which is then "linked"1 to the metatable.
Thus there are three different tables in the above - cat1 (with mt catClass), cat2 (also with mt catClass) and catClass itself. Altering cat1, as done, therefor has no effect on the other two tables.
1 See Lua Metatables Tutorial; the use of the __index presented in the metatable effectively emulates JavaScript's [prototype] resolution.
When you look up a table with a key, regardless of what the key is, and a value hasn't been assigned for that key .. Lua will look for an __index key in the table's metatable .. If __index contains a table, Lua will look up the key originally used in the table belonging to __index.
However, __index has no affect on assigning a new index to one of the tables - the particular table is just modified as normal. (The tutorial also goes on to explain __newindex, if such write-through behavior is desired.)

Related

Table structure for data with many NULLs

I'm currently trying to model a dynamic data object that can have or miss some properties (the property names are known for the current requirement). It is not known if new properties will be added later on (but it is almost certain). The modeled object is something along the line of this:
int id PRIMARY KEY NOT NULL;
int owner FOREIGN KEY NOT NULL;
Date date NOT NULL;
Time time NOT NULL;
Map<String,String> properties;
A property can be of any type ( int, bool, string,... )
I'm not sure how i should model this object in an SQL database. There are 2 ways i can think of to do this and i would like to have some input which will be the better choice in terms of developer "work"(maintenance), memory consumption and performance. As a side info: properties are almost always NULL (not existant)
(1) I would have a big table that has id, owner, date, time and every property as a column whereas missing properties for a row are modeled as NULL. e.g.
TABLE_X
id|owner|date|time|prop_1|prop_2|prop_3|...
This table would have alot of NULL values.
If new properties should be added then i would do an ALTER TABLE and insert a new column for every new property
Here i would do a "usual"
SELECT * FROM TABLE_X ...
(2) I would have a main table with all NOT NULL data:
TABLE_X
id|owner|date|time
And then have a seperate table for every property, like this:
TABLE_X_PROP_N
foreign_key(TABLE_X(id))|value
Here would be no NULL values at all. A property either has a value and is in its corresponding table or it is NULL and then does not appear in its table.
To add new properties i would just add another table.
Here is would do a
SELECT * FROM TABLE_X LEFT JOIN TABLE_X_PROP_1 ON ... LEFT JOIN TABLE_X_PROP_2 ON ...
To repeat the question (so you don't have to scroll up):
Which of boths ways to deal with the problem is the better in terms of maintenance (work for developer), memory consumption (on disk) and performance (more queries per second)? Maybe you also have a better idea on how to deal with this. Thanks in advance
If you go with Option 2, I would think you need 3 tables:
TABLE_HEADER
id|owner|date|time
TABLE_PROPERTY
id|name
TABLE_PROPERTYVALUE
id|headerID(FK)|propertyID(FK)|value
Easy to add new properties allow you greater flexibility and to iterate much faster. The number of properties would also have an effect (for example if you have 500 properties you aren't going to want a table with 500 columns!). The main downside is it will become ugly if you need to attach complex business logic using the properties as its a more complex structure to navigate and you can't enforce data integrity like not null for particular fields. If you truly want a property bag like you have modeled in your object structure then this maps easily. Like everything it depends on your circumstances for what is most suitable.
Solution 2. but why without separate tables for every property. Just put everything in one table:
properties(
foreign_key(TABLE_X(id))
property_name,
value);
Sounds like you're trying to implement an Entity-Attribute-Value (often-viewed-as-an-anti-)pattern here. Are you familiar with them? Here's a few references:
https://softwareengineering.stackexchange.com/questions/93124/eav-is-it-really-bad-in-all-scenarios
http://www.dbforums.com/showthread.php?1619660-OTLT-EAV-design-why-do-people-hate-it
https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
Personally I'm extremely wary of this type of setup in a RDBMS. I tend to think that NoSQL document style databases would be a better fit for these types of dynamic structures, though admittedly I have relatively little real-world experience with NoSQL myself.

SQL vs NoSQL for data that will be presented to a user after multiple filters have been added

I am about to embark on a project for work that is very outside my normal scope of duties. As a SQL DBA, my initial inclination was to approach the project using a SQL database but the more I learn about NoSQL, the more I believe that it might be the better option. I was hoping that I could use this question to describe the project at a high level to get some feedback on the pros and cons of using each option.
The project is relatively straightforward. I have a set of objects that have various attributes. Some of these attributes are common to all objects whereas some are common only to a subset of the objects. What I am tasked with building is a service where the user chooses a series of filters that are based on the attributes of an object and then is returned a list of objects that matches all^ of the filters. When the user selects a filter, he or she may be filtering on a common or subset attribute but that is abstracted on the front end.
^ There is a chance, depending on user feedback, that the list of objects may match only some of the filters and the quality of the match will be displayed to the user through a score that indicates how many of the criteria were matched.
After watching this talk by Martin Folwler (http://www.youtube.com/watch?v=qI_g07C_Q5I), it would seem that a document-style NoSQL database should suit my needs but given that I have no experience with this approach, it is also possible that I am missing something obvious.
Some additional information - The database will initially have about 5,000 objects with each object containing 10 to 50 attributes but the number of objects will definitely grow over time and the number of attributes could grow depending on user feedback. In addition, I am hoping to have the ability to make rapid changes to the product as I get user feedback so flexibility is very important.
Any feedback would be very much appreciated and I would be happy to provide more information if I have left anything critical out of my discussion. Thanks.
This problem can be solved in by using two separate pieces of technology. The first is to use a relatively well designed database schema with a modern RDBMS. By modeling the application using the usual principles of normalization, you'll get really good response out of storage for individual CRUD statements.
Searching this schema, as you've surmised, is going to be a nightmare at scale. Don't do it. Instead look into using Solr/Lucene as your full text search engine. Solr's support for dynamic fields means you can add new properties to your documents/objects on the fly and immediately have the ability to search inside your data if you have designed your Solr schema correctly.
I'm not an expert in NoSQL, so I will not be advocating it. However, I have few points that can help you address your questions regarding the relational database structure.
First thing that I see right away is, you are talking about inheritance (at least conceptually). Your objects inherit from each-other, thus you have additional attributes for derived objects. Say you are adding a new type of object, first thing you need to do (conceptually) is to find a base/super (parent) object type for it, that has subset of the attributes and you are adding on top of them (extending base object type).
Once you get used to thinking like said above, next thing is about inheritance mapping patterns for relational databases. I'll steal terms from Martin Fowler to describe it here.
You can hold inheritance chain in the database by following one of the 3 ways:
1 - Single table inheritance: Whole inheritance chain is in one table. So, all new types of objects go into the same table.
Advantages: your search query has only one table to search, and it must be faster than a join for example.
Disadvantages: table grows faster than with option 2 for example; you have to add a type column that says what type of object is the row; some rows have empty columns because they belong to other types of objects.
2 - Concrete table inheritance: Separate table for each new type of object.
Advantages: if search affects only one type, you search only one table at a time; each table grows slower than in option 1 for example.
Disadvantages: you need to use union of queries if searching several types at the same time.
3 - Class table inheritance: One table for the base type object with its attributes only, additional tables with additional attributes for each child object type. So, child tables refer to the base table with PK/FK relations.
Advantages: all types are present in one table so easy to search all together using common attributes.
Disadvantages: base table grows fast because it contains part of child tables too; you need to use join to search all types of objects with all attributes.
Which one to choose?
It's a trade-off obviously. If you expect to have many types of objects added, I would go with Concrete table inheritance that gives reasonable query and scaling options. Class table inheritance seems to be not very friendly with fast queries and scalability. Single table inheritance seems to work with small number of types better.
Your call, my friend!
May as well make this an answer. I should comment that I'm not strong in NoSQL, so I tend to lean towards SQL.
I'd do this as a three table set. You will see it referred to as entity value pair logic on the web...it's a way of handling multiple dynamic attributes for items. Lets say you have a bunch of products and each one has a few attributes.
Prd 1 - a,b,c
Prd 2 - a,d,e,f
Prd 3 - a,b,d,g
Prd 4 - a,c,d,e,f
So here are 4 products and 6 attributes...same theory will work for hundreds of products and thousands of attributes. Standard way of holding this in one table requires the product info along with 6 columns to store the data (in this setup at least one third of them are null). New attribute added means altering the table to add another column to it and coming up with a script to populate existing or just leaving it null for all existing. Not the most fun, can be a head ache.
The alternative to this is a name value pair setup. You want a 'header' table to hold the common values amoungst your products (like name, or price...things that all rpoducts always have). In our example above, you will notice that attribute 'a' is being used on each record...this does mean attribute a can be a part of the header table as well. We'll call the key column here 'header_id'.
Second table is a reference table that is simply going to store the attributes that can be assigned to each product and assign an ID to it. We'll call the table attribute with atrr_id for a key. Rather straight forwards, each attribute above will be one row.
Quick example:
attr_id, attribute_name, notes
1,b, the length of time the product takes to install
2,c, spare part required
etc...
It's just a list of all of your attributes and what that attribute means. In the future, you will be adding a row to this table to open up a new attribute for each header.
Final table is a mapping table that actually holds the info. You will have your product id, the attribute id, and then the value. Normally called the detail table:
prd1, b, 5 mins
prd1, c, needs spare jack
prd2, d, 'misc text'
prd3, b, 15 mins
See how the data is stored as product key, value label, value? Any future product added can have any combination of any attributes stored in this table. Adding new attributes is adding a new line to the attribute table and then populating the details table as needed.
I beleive there is a wiki for it too... http://en.wikipedia.org/wiki/Entity-attribute-value_model
After this, it's simply figuring out the best methodology to pivot out your data (I'd recommend Postgres as an opensource db option here)

Designing SQL database to represent OO class hierarchy

I'm in the process of converting a class hierarchy to be stored in an SQL database.
Original pseudo code:
abstract class Note
{
int id;
string message;
};
class TimeNote : public Note
{
time_t time;
};
class TimeRangeNote : public Note
{
time_t begin;
time_t end;
};
class EventNote : public Note
{
int event_id;
};
// More classes deriving from Note excluded.
Currently I'm having a couple of ideas how to store this in a database.
A. Store all notes in a single wide table
The table would contain all information needed by all classes deriving from Note.
CREATE TABLE t_note(
id INTEGER PRIMARY KEY,
message TEXT,
time DATETIME,
begin DATETIME,
end DATETIME,
event_id INTEGER
);
Future classes deriving from Note need to add new columns to this table.
B. Map each class to a table
CREATE TABLE t_note(
id INTEGER PRIMARY KEY,
message TEXT
);
CREATE TABLE t_timenote(
note_id INTEGER PRIMARY KEY REFERENCES t_note(id),
time DATETIME
);
CREATE TABLE t_timerangenote(
note_id INTEGER PRIMARY KEY REFERENCES t_note(id),
begin DATETIME,
end DATETIME
);
CREATE TABLE t_eventnote(
note_id INTEGER PRIMARY KEY REFERENCES t_note(id),
event_id INTEGER
);
Future classes deriving from Note need to create a new table.
C. Use database normalization and VARIANT/SQL_VARIANT
CREATE TABLE t_note(
id INTEGER PRIMARY KEY,
message TEXT
);
CREATE TABLE t_notedata(
note_id INTEGER REFERENCES t_note(id),
variable_id TEXT, -- or "variable_id INTEGER REFERENCES t_variable(id)".
-- where t_variable has information of each variable.
value VARIANT
);
Future classes deriving from Note need to add new variable_id.
D. Map each concrete class to a table (newly added based on current answers)
CREATE TABLE t_timenote(
id INTEGER PRIMARY KEY,
message TEXT,
time DATETIME
);
CREATE TABLE t_timerangenote(
id INTEGER PRIMARY KEY,
message TEXT,
begin DATETIME,
end DATETIME
);
CREATE TABLE t_eventnote(
id INTEGER PRIMARY KEY,
message TEXT,
event_id INTEGER
);
Future classes deriving from Note need to create a new table.
What would be the most logical representation in SQL be?
Are there any better options?
In general I prefer obtion "B" (i.e. one table for base class and one table for each "concrete" subclass).
Of course this has a couple of drawbacks: first of all you have to join at least 2 tables whenever you have to read a full instance of a subclass. Also, the "base" table will be constantly accessed by anyone who has to operate on any kind of note.
But this is usually acceptable unless you have extreme cases (billions of rows, very quick response times required and so on).
There is a third possible option: map each subclass to a distinct table. This helps partitioning your objects but costs more in development effort, in general.
See this for a complete discussion.
(Regarding your "C" solution, using VARIANT: I can't comment on the merits/demerits, because it looks like a proprietary solution - what is it ? Transact-SQL? and I am not familiar with it).
Your 'B' option as described is pretty much an implementation of the 'Object Subclass Heirarchy' (Kung, 1990 http://portal.acm.org/citation.cfm?id=79213)
As such, it's a well established and understood method. It works quite well. It's also extensible through multiple levels of inheritance, should you need it.
Of course you lose some of the benefits of encapsulation and information hiding, if you don't restrict who can access the data theough the DBMS interface.
You can however access it from multiple systems, and even languages, simultaneously (e.g Java, C++, C#)
(This was the subject of my Masters dissertation :)
You've hit the 3 most-commonly-accepted ways of modeling objects into a relational database. All 3 are acceptable, and each has their own pros and cons. Unfortunately, that means there's no cut-n-dry "right" answer. I've implemented each of those at different times, and here's a couple notes/caveats to keep in mind:
Option A has the drawback that, when you add a new subclass, you must modify an existing table (this may be less palatable to you than adding a new table). It also has the drawback that many columns will contain NULLs. However, modern DBs seem MUCH better at managing space than older DBs, so I've never been too worried about nulls. One benefit is that none of your search or retrieve operations will require JOINs or UNIONs, which means potentially better performance and simpler SQL.
Option B has the drawback that, if you add a new property to your superclass, you need to add a new column to each and every subclass's table. Also, if you want to do a heterogeneous search (all subclasses at once), you must do so using a UNION or JOIN (potentially slower performance and/or more complex sql).
Option C has the drawback that all retrieval operations (even for just one subclass) will involve a JOIN, as will most searches. Also, all inserts will involve multiple tables, making for somewhat more complex SQL, and will necessitate use of transactions. This option seems to be the most "pure" from a data-normalization standpoint, but I rarely use it because the JOIN-for-every-operation drawback usually makes one of the other options more palatable.
I'd grativate towards option A myself.
It also depends a bit on your usage scenarios, for example will you need to do lots of searches across all types of notes? If yes, then you might be better off with option A.
You can always store them as option A (one big table) and create Views for the different sub-notes if you so please. That way, you can still have a logical seperation while having good searchability.
Generally speaking, but this might be close to a religious discussion so beware, I believe that a relational database should be a relational database and not try to mimic an OO structure. Let your classes do the OO stuff, let the db be relational. There are specific OO databases available if you want to extend this to your datastore. It does mean that you have to cross the 'Object-relational impedance mismatch' as they call it, but again there are ORM mappers for that specific purpose.
I would go for option A.
Solution B is good if the class hierarchy is very complex with dozens of classes inheriting each others. It's the most scalable solution. However, the drawback is that it makes the SQL more complex and slower.
For relatively simple cases, like 4 or 5 classes all inheriting the same base class, it makes more sense to choose solution A. The SQL would be more simple and faster. And the overhead of having additional columns with NULL values is negligible.
There's a series of patterns collectively known as "Crossing Chasms" I've used for many years. Don't let the references to Smalltalk throw you - it's applicable to any object oriented language. Try the following references:
A Pattern Language for Relational Databases and Smalltalk
Crossing Chasms - The Static Patterns
Crossing Chasms - The Architectural Patterns
Share and enjoy.
EDIT
Wayback Machine links to everything I've been able to find on the Crossing Chasms patterns:
http://web.archive.org/web/20040604122702/http://www.ksccary.com/article1.htm
http://web.archive.org/web/20040604123327/http://www.ksccary.com/article2.htm
http://web.archive.org/web/20040604010736/http://www.ksccary.com/article5.htm
http://web.archive.org/web/20030402004741/http://members.aol.com/kgb1001001/Chasms.htm
http://web.archive.org/web/20060922233842/http://people.engr.ncsu.edu/efg/591O/s98/lectures/persistent-patterns/chasms.pdf
http://web.archive.org/web/20081119235258/http://www.smalltalktraining.com/articles/crossingchasms.htm
http://web.archive.org/web/20081120000232/http://www.smalltalktraining.com/articles/staticpatterns.htm
I've created a Word document which integrates all the above into something resembling a coherent whole, but I don't have a server I can drop it on to make it publicly available. If someone can suggest a free document repository I'd be happy to put the doc up there.
I known that this question is old, but I have another option:
You can store in any table column (text type) a Note object, or an Note object collection, as json structure. You can serialize and deserialize json using Newtonsoft. You will need to specifies type name handling options to Object for the JsonSerializer.

Designing an append-only data access layer with LINQ to SQL

I have an application in mind which dicates database tables be append-only; that is, I can only insert data into the database but never update or delete it. I would like to use LINQ to SQL to build this.
Since tables are append-only but I still need to be able to "delete" data, my thought is that each table Foo needs to have a corresponding FooDeletion table. The FooDeletion table contains a foreign key which references a Foo that has been deleted. For example, the following tables describe the state "Foos 1, 2, and 3 exist, but Foo 2 and Foo 3 have been deleted".
Foo FooDeletion
id id fooid
---- -------------
1 1 2
2 2 3
3
Although I could build an abstraction on top of the data access layer which (a) prevents direct access to LINQ to SQL entities and (b) manages deletions in this manner, one of my goals is to keep my data access layer as thin as possible, so I'd prefer to make the DataContext or entity classes do the work behind the scenes. So, I'd like to let callers use Table<Foo>.DeleteOnSubmit() like normal, and the DAL knows to add a row to FooDeletion instead of deleting a row from Foo.
I've read through "Implementing Business Logic" and "Customizing the Insert, Update, and Delete Behavior of Entity Classes", but I can't find a concrete way to implement what I want. I thought I could use the partial method DataContext.DeleteFoo() to instead call ExecuteDynamicInsert(FooDeletion), but according to this article, "If an inapplicable method is called (for example, ExecuteDynamicDelete for an object to be updated), the results are undefined".
Is this a fool's errand? Am I making this far harder on myself than I need to?
You have more than one option - you can either:
a) Override SubmitChanges, take the change set (GetChangeSet()) and translate updates and deletes into inserts.
b) Use instead-of triggers db-side to change the updates/delete behavior.
c) Add a new Delete extension method to Table that implements the behavior you want.
d) ...or combine a+b+c as needed...
if you want a big-boy enterprise quality solution, you'd put it in the database - either b) from above or CRUD procedures <- my preference... triggers are evil.
If this is a small shop, not a lot of other developers or teams, or data of minimal value such that a second or third app trying to access the data isn't likely than stick with whatever floats your boat.

Hibernate and IDs

Is it possible in hibernate to have an entity where some IDs are assigned and some are generated?
For instance:
Some objects have an ID between 1-10000 that are generated outside of the database; while some entities come in with no ID and need an ID generated by the database.
You could use 'assigned' as the Id generation strategy, but you would have to give the entity its id before you saved it to the database. Alternately you could build your own implementation of org.hibernate.id.IdentifierGenerator to provide the Id in the manner you've suggested.
I have to agree w/ Cade Roux though, and doing so seems like it be much more difficult than using built in increment, uuid, or other form of id generation.
I would avoid this and simply have an auxiliary column for the information about the source of the object and a column for the external identifier (assuming the external identifier was an important value you wanted to keep track of).
It's generally a bad idea to use columns for mixed purposes - in this case to infer from the nature of a surrogate key the source of an object.
Use any generator you like, make sure it can start at an offset (when you use a sequence, you can initialize it accordingly).
For all other entities, call setId() before you insert them. Hibernate will only generate an id if the id property is 0. Note that you should first insert objects with ids into the db and then work with them. There is a lot of code in Hibernate which expects the object to be in the DB when id != 0.
Another solution is to use negative ids for entities which come with an id. This will also make sure that there are no collisions when you insert an new object.