Is it possible to use Nhibernate with partition of an object over several tables? - nhibernate

We are having a system that gather large quantities of data each month and performes rather advanced calculations that increase the database even more. Since we have the requirements from the customer that we need to store data for fast access three years back and that we must be able to access older data (up to ten years), this however can be low performance and requires some work. We want to avoid performance issues where the database and its tables grows out of proportion.
After discussing using SQL Enterprise (VERY costly and full of traps since we haven't gotten the know-how) and since our system have so many tables that referenses each other we are leaning towards creating some kind of history tables to which we move data in a monthly fashion and rewrite the select queries that we have based on parameters to search either in the regular table or in the history or both depending on the situation.
Since we also are using NHibernate for mapping I was wondering if it is possbible to create a mapping file that handles this by itself (almost) using some sort of polymorfism or inheritance in which each object is stored in different tables based on parameters?
I know this sounds complicated and strange and that there is other methods to perform this but I this question I would rather have people answering the question asked and not give other sugestions to use instead.

As far as I know NHibernate can't do that (each class can be mapped to one table/view )but you can use SQL Queries or StoredProcedures (depends on the version of NHibernate that you are using) to populate mapped objects.
In your case you can have a combined view created by making unions of different tables Then you can use a SQL query to populated your entity.
There's also another solution that you create a summary object for your queries that uses that view ,therefore you can use both HQL and criteria to query this object.

Short answer "no". I would not create views as you mention a lot of joining.
Personally I would create summary tables and map to these directly using a stateless session or a very least mutable=false on the class definition. Think of these summary tables as denormalised data for report only. The only drawback is if historic data changes on a regular basis then the summary tables also needs changing. If historical data never changes then this should be simple to achieve.
I would also most probably store these summary tables on another catalog rather than adding to the size of the current system.
Its not a quick win this one I am afraid.

Related

Why magento store EAV data across attribute type?

In magento system, there are 5 tables to store EAV data across attribute type. Is it an effective performance choice to do it? When I make a sql query, I still need to use UNION clause to get the whole data set. If I use one mixed table to store EAV data according by the only one data type(varchar, or sql_variant in sql server2008), what will I encounter performance issue in future?
Is it an effective performance choice to do it?
The Magento developers chose to use an EAV structure because it performs well under high volumes of data. A flat table structure would be suitable for a small setup, but as it scales it becomes less and less efficient.
When I make a sql query, I still need to use UNION clause to get the whole data set.
You should try in every possible case to avoid direct SQL queries on a Magento database. They have Setup models that you can use for installing new data, of which you either utilise the Magento models that already exist and the methods that the Setup models have to create/modify the Magento core config data or variables, or use the underlying Zend framework's ORM if you need to to create new tables, etc.
In terms of the EAV part of the database specifically, the way it is setup is complicated if you attack it from the SQL point-of-view, which is why Magento models exist so that it can be all wrapped up in PHP ORM. Again, avoid SQL queries if you can.
If you have to make direct queries, you wouldn't be creating UNION queries but joins onto those tables, and you'd use the eav_attribute table as a pivot table to provide you with both the attribute_id (primary key) and the source table which the value will exist in.
By using direct SQL queries you also lose the fallback system that Magento implements where store or website level values can exist, and the Magento models will select them if you ask for them at a store level. If you want to do this manually with SQL then the queries become more complicated as you need to look for those values and if they aren't found, revert to the default (global scope) value.
If I use one mixed table to store EAV data according by the only one data type(varchar, or sql_variant in sql server2008), what will I encounter performance issue in future?
As mentioned before, it depends on the expected scale of your database. You will notice that there are plenty of flat tables in a standard Magento database, and that EAV structures only apply to the parts that Magento developers have decided may increase drastically in volume (customers, catalog etc).
If you want to implement a custom module and you think that it also has the potential to grow quickly over time then you can implement your own EAV tables for it. The Magento model scaffolds support this, and there is plenty of resource online about how to set them up.
If your tables are likely to remain (relatively) small, then by all means go for a flat table approach. If it's a custom module and you notice rapid growth, you can always convert it later before it becomes a bottleneck.

Normalisation and multi-valued fields

I'm having a problem with my students using multi-valued fields in access and getting confused about normalisation as a result.
Here is what I can make out. Given a 1-to-many relationship, e.g.
Articles Comments
-------- --------
artID{PK} commID{PK}
text text
artID{FK}
Access makes it possible to store this information into what appears to be one table, something like
Articles
--------
artID{PK}
text
comment
+ value
"value" referring to multiple comment values for the comment "column", which access actually stores as a separate table. The specifics of how the values are stored - table, its PK and FK - is completely hidden, but it is possible to query the multi-valued field, e.g. in the example above with the query
INSERT INTO article( [comment].Value )
VALUES ('thank you')
WHERE artID = 1;
But the query doesn't quite reveal the underlying structure of the hidden table implementing the multi-valued field.
Given this (disaster, in my view) - my problem is how to help newcomers to database design and normalisation understand what Access is offering them, why it may not be helpful, and that it is not a reason to ignore the basics of the relational model. More specifically:
Are there better ways, besides queries as above, to reveal the structure behind multi-valued fields?
Are there good examples of where the multi-valued field is not good enough, and shows the advantage of normalising explicitly?
Are there straightforward ways to obtain the multi-select visual output of Access multi-values, but based on separate, explicit tables?
Thanks!
I cannot give you advice in using this feature, because I never used it; however, I can give you reasons not to use it.
I want to have full control on what I'm doing. This is not the case for multi-valued fields, therefore I don't use them.
This feature is not expandable. What if you want to add a date field to your comments, for instance?
It is sometimes necessary to upsize an Access (backend) database to a "big" database (SQL Server, Oracle). These Databases don't offer such a feature. It is often the customer who decides which database has to be used. Recently I had to migrate an Access application (frontend) using an Oracle backend to a SQL-Server backend because my client decided to drop his Oracle server. Therefore it is a good idea to restrict yourself to use only common features.
For common tasks like editing lookup tables I created generic forms. My existing solutions will not work with multi-valued fields.
I have a (self-made) tool that synchronizes changes in the structure of the database on my developer’s site with the database on the client’s site. This tool cannot deal with multi-valued fields.
I have tools for the security management that can grant SELECT, INSERT, UPDATE and DELETE rights on tables or revoke them. Again, the management tool does not work with multi-valued fields.
Having a separate table for the comments allows you to quickly inspect all the comments (by opening the table). You cannot do this with multi-valued fields.
You will not see the 1 to n relation between the articles and the comments in a database diagram.
With a separate table you can choose whether you want to cascade deletes to the details table or not. If you don't, you will not be able to delete an article as long as there are comments attached to it. This can be desirable, if you want to protect the comments from being deleted inadvertently.
It is important to realize the difference between physical and logical relationships. Today the whole internet and web services (SOAP) quite much realizes on a data format that is multi-value in nature.
When you represent multi-value data with a relational database (such as Access), then behind the scenes you are using a traditional (and legitimate) relation. I cannot stress that as such, then the use of multi-value columns in Access is in fact a LEGITIMATE relational model.
The fact that table is not exposed does not negate this issue. In fact, if you represent an invoice (master record, and repeating details) as a XML data cube, then we see two things:
1) you can build and represent that invoice with a relational database like Access
2) such a relational data model that is normalized can ALSO be represented as a SINGLE xml string.
3) deleting the XML record (or string) means that cascade delete of the child rows (invoice details) MUST occur.
So while it is true that Multi-Value fields been added to Access to deal with SharePoint, it is MOST important to realize that such data can be mapped to a relational database (if you could not do this, then Access could not consume that XML data using relational database tables as ACCESS CURRENTLY DOES RIGHT NOW).
And with the web such as XML, and SharePoint then the need to consume and manage and utilize such data is not only widespread, but is in fact a basic staple of the internet.
As more and more data becomes of a complex nature, we find the requirement for multi-value data exploding in use. Anyone who used that so called "fad" the internet is thus relying and using data that is in fact VERY OFTEN XML and is multi-value (complex) in nature.
As long as the logical (not physical) relational data model is kept, then use of multi-value columns to represent such data is possible and this is exactly what Access is doing (it is mapping the relational data model to a complex model). Note that the complex (xml) data model does NOT necessary have to be relational in nature. However, if you ARE going to map such data to Access then the complex multi-value model MUST CONFORM TO A RELATIONAL data model.
This is EXACTLY what is occurring in Access.
The fact that such a correct and legitimate math relational model is not exposed is of little issue here. Are we to suggest that because Excel does not expose the binary codes used then users will never learn about computers? Or perhaps we all must program in assembler so we all correctly learn how computers works.
At the end of the day, who cares and why does this matter? The fact that people drive automatic cars today does not toss out the concept that they are using different gears to operate that car. The idea that we shut down all of society because someone is going to drive an automatic car or in this case use complex data would be galactic stupid on our part.
So keep in mind that extensions to SQL do exist in Access to query the multi-value data, but as well pointed out here those underlying tables are not exposed. However, as noted, exposing such tables would STILL REQUIRE one to not change or mess with cascade delete since that feature is required TO MAINTAIN A INTERSECTION OF FEATURES and a CORRECT MATH relational model between the complex data model (xml) and that of using two related tables to represent such data.
In other words, you can use related tables to represent the complex data model IF YOU REMOVE the ability of users to play with the referential integrity options. The RI options MUST remain as set in those hidden tables else such data will not be able to make the trip BACK to the XML or complex data model of which it was consumed from.
As noted, in regards to users being taught how gasoline reacts with oxygen for that of learning to drive a car, or using a word processor and being forced to learn a relational model and expose the underlying tables makes little sense here.
However, the points made here in regards to such tables being exposed are legitimate concerns.
The REAL problem is SQL server and Oracle etc. cannot consume or represent that complex data WHILE ACCESS CAN CONSUME such data.
As noted, the complex data ship has LONG ago sailed! XML, soap, and the basic technologies of the internet are based on this complex data model.
In effect, SQL server, Oracle and most databases cannot that consume this multi-value data represent it without users having to create and model such data in a relational fashion is a BIG shortcoming of SQL server etc.
Access stands alone in this ability to consume this data.
So, for anyone who used a smartphone, iPad or the web, you are using basic technologies that are built around using complex data, something that Access now allows.
It is likely that the rest of the industry will have to follow suit given that more and more data is complex in nature. If the database industry does not change, then the mainstream traditional relational database system will NOT be the resting place of such data.
A trend away from storing data in related tables is occurring at a rapid pace right now and products like SharePoint, or even Google docs is proof of this concept. So Access is only reacting to market pressures and it is likely that other database vendors will have to follow suit or simply give up on being part of the "fad" called the internet.
XML and complex data structures are STAPLE and fact of our industry right now – this is not an issue we all should run away from, but in fact embrace.
Albert D. Kallal (Access MVP)
Edmonton, Alberta Canada
kallal#msn.com
The technical discussion is interesting. I think the real problem lies in student understanding. Because it is available in Access students will use it, and initially it will probably provide a simple solution to some design problems. The negatives will occur later when they try and use the data. Maybe a simple example demonstrating the problems would persuade some students to avoid using multi-valued fields ? Maybe an example of storing the data in another, more usable format would help ?
Good luck !
Peter Bullard
MS Access does a great job of simplifying database management and abstracting out a lot of complexity. This however makes the learning of dbms concepts a bit difficult. Have you tried using other 'standard' dbms tools like MySQL (or even sqlite). From a learning perspective they may be better.
I know this post is old. But, it's not quite the same as every other post I've seen on this topic. This one has someone making a good case for using Multi Valued Fields...
As someone who is trying who is still trying very hard to get their head around Access, I find the discussion for and against using the Multi Valued Fields incredibly frustrating.
I'm trying to sort through it all, but if everyone is so against them, what is an alternative method? It seems that in every search result I find everyone is either telling you how to use Multi Valued Fields and Controls or telling you how horrible and what a mistake they are. Many people refer to an alternative to them, but nobody says "Here's an example". I'm here to learn about these things. And while I know that this is a simpler concept for a lot of people in these forums, I could really use some examples to take a look at.
I'm at a point where I have to decide which way to go. It would be wonderful to compare examples of using Multi Valued Fields and alternatives and using a control to select multiple values.
Or am I wrong and the functionality of a combobox where you can select multiple items is only available through Access?
I want to address the last of your questions first. There is a way of providing a visual presentation of a parent child relationship. It's called subforms. If you get help about subforms in Access, it will explain the concept.
I have used subforms in a project where I wanted to display the transaction header in a form and the transaction details in a subform. There is nothing to hinder this construct even when the data is stored in two normalized tables.
Of course, this affects the screen, not the database. That's the whole point. Normalization is relevant to storage and retrieval, not to other uses of data.

Database model refactoring for combining multiple tables of heterogeneous data in SQL Server?

I took over the task of re-developing a database of scientific data which is used by a web interface, where the original author had taken a 'table-per-dataset' approach which didn't scale well and is now fairly difficult to manage with more than 200 tables that have been created. I've spent quite a bit of time trying to figure out how to wrangle the thing, but the datasets contain heterogeneous values, so it is not reasonably possible to combine them into one table with a set schema for column definitions.
I've explored the possibility of EAV, XML columns, and ended up attempting to go with a table with many sparse columns since the database is running on SQL Server 2008. The DBAs are having some issues with my recently created sparse columns causing some havoc with their backup scripts, so I'm left wondering again if there isn't a better way to do this. I know that EAV does not lead to decent performance, and my experiments with XML data types also demonstrated poor performance, probably thanks to the large number of records in some of the tables.
Here's the summary:
Around 200 tables, most of which have a few columns containing floats and small strings
Some tables have as many as 15,000 records
Table schemas are not consistent, as the columns depended on the number of samples in the original experimental data.
SQL Server 2008
I'll be treating most of this data as legacy in the new version I'm developing, but I still need to be able to display it and query it- and I'd rather not have to do so by dynamically specifying the table name in my stored procedures as it would be with the current multi-table approach. Any suggestions?
I would suggest that the first step is looking to rationalise the data through views; attempt to consolidate similar data sets into logical pools through views.
You could then look at refactoring the code to look at the views, and see if the web platform operates effectively. From there you could decided whether or not the view structure is beneficial and if so, look to physically rationalising the data into a new table.
The benefit of using views in this manner is you should be able to squeak a little performance out of indexes on the views, and it should also give you a better handle on the data (that said, since you are dev'ing the new version, it would suggest you are perfectly capable of understanding the problem domain).
With 200 tables as simple raw data sets, and considering you believe your version will be taking over, I would probably go through the prototype exercise of seeing if you can't write the views to be identically named to what your final table names will be in V2, that way you can also backtest if your new database structure is in fact going to work.
Finally, a word to the wise, when someone has built a database in the way you've described, without looking at the data, and really knowing the problem set; they did it for a reason. Either it was bad design, or there was a cause for what now appears on the surface to be bad design; you raise consistency as an issue - look to wrap the data and see how consistent you can make it.
Good luck!

SQL Views vs. Database Abstraction (in code)

I just learned about SQL views, which seem nice, but if I abstract table joining to a data access class, wouldn't this accomplish the same thing? What are your thoughts on this? I've never used views before so this is all pretty new to me.
Remember that not all applications that hit your your database may be using the same data access classes. Nor are they used in exports or reporting. The views are a better place to abstract some complex things (such as how to get certain kinds of financial information) if you want consistentcy. However, don't go overboard with abstracting things to views either. Views should not call views (at least in Sql Server but I suspect in other dbs as well) because you have to materialize all the underlying views to get the data in the top layer. This means to get to the three records you want, you might end up materializing millions of records first. With large tables this can create a performance problem of nightmare proportions. Further views that call views that call views become a truly difficult maintenance problem when something needs to be fixed.
The main purpose of a view is to abstract the complexity of creating a specific result set. In large relational databases, you often need to join many tables together to get useful data. By creating views, any client can access it without needing access to your database access layer.
Additionally, almost all RDMSs will optimize a view by caching the parsed execution plan. If you query is complicated enough, you may hit a substantial query planning hit when executing the query. However, with a view, the query plan is created and saved when the view is created or when it is used for the first time.
Views can also be great for maintaining backward compatibility. Say you have a table that needs to change, but it would be difficult to update all the clients at once to use the new table schema. You could create a view with the old table name that provides the backwards compatibility. You can then create a new table with the new schema.
I'd say one of the main purposes of views are to simplify the interface betwen a complex database (whether it be star schema/OLTP) and another layer (user / OLAP cube / reporting interface).
I guess broadly I'd say that if there can be multiple ways that you can access your database (MS Excel/.net app) then you would want to use SQL views as then they are available to all, otherwise if you create a data access class in c# (e.g.), then it wont be usable by the MS Excel people.
Views simply put, reduce the complex look of all the joins put together in a sql query.
So instead of executing a join on 30 tables, a view does the 30 table join but then can be reused in another view / sproc to simply say:
SELECT * FROM myView
Rather then:
SELECT...
FROM
...
INNER JOIN
...
INNER JOIN
...
INNER JOIN
...
It basically hides all these details. This article should be a great reference: http://www.craigsmullins.com/cnr_0299b.htm
The point is views are not physical structures, they are simply a relational model or "view" of one or many tables in a database system.
Abstracting joins to a data access class might give you the same data, but it might not give you the same performance.
Also, for most businesses the database is a shared resource. It's sensible to assume that there are applications already hitting the database that cannot or will not go through your data access class. It's also sensible to assume that some future applications cannot or will not go through your data access class. As a trivial example, the command-line interface and the graphical interface to any dbms you use won't be using your data access class.
Views are also the way SQL databases implement logical data independence. Think of them as part of the public interface.
Views can be shared by interactive SQL users, report writers, OLAP tools, and applications written in different languages or by multiple programming teams that don't share classes with eahc other.
As such, it's a good way for database designers to share standardized queries across the whole community of users of the data.

Should I be concerned that ORMs, by default, return all columns?

In my limited experience in working with ORMs (so far LLBL Gen Pro and Entity Framework 4), I've noticed that inherently, queries return data for all columns. I know NHibernate is another popular ORM, and I'm not sure that this applies with it or not, but I would assume it does.
Of course, I know there are workarounds:
Create a SQL view and create models and mappings on the view
Use a stored procedure and create models and mappings on the result set returned
I know that adhering to certain practices can help mitigate this:
Ensuring your row counts are reasonably limited when selecting data
Ensuring your tables aren't excessively wide (large number of columns and/or large data types)
So here are my questions:
Are the above practices sufficient, or should I still consider finding ways to limit the number of columns returned?
Are there other ways to limit returned columns other than the ones I listed above?
How do you typically approach this in your projects?
Thanks in advance.
UPDATE: This sort of stems from the notion that SELECT * is thought of as a bad practice. See this discussion.
One of the reasons to use an ORM of nearly any kind is to delay a lot of those lower-level concerns and focus on the business logic. As long as you keep your joins reasonable and your table widths sane, ORMs are designed to make it easy to get data in and out, and that requires having the entire row available.
Personally, I consider issues like this premature optimization until encountering a specific case that bogs down because of table width.
First of : great question, and about time someone asked this! :-)
Yes, the fact an ORM typically returns all columns for a database table is something you need to take into consideration when designing your systems. But as you've mentioned - there are ways around this.
The main fact for me is to be aware that this is what happens - either a SELECT * FROM dbo.YourTable, or (better) a SELECT (list of all columns) FROM dbo.YourTable.
This is not a problem when you really want the whole object and all its properties, and as long as you load a few rows, that's fine, too - the convenience beats the raw performance.
You might need to think about changing your database structures a little bit - things like:
maybe put large columns like BLOBs into separate tables with a 1:1 link to your base table - that way, a select on the parent tables doesn't grab all those large blobs of data
maybe put groups of columns that are optional, that might only show up in certain situations, into separate tables and link them - again, just to keep the base tables lean'n'mean
Also: avoid trying to "arm-wrestle" your ORM into doing bulk operations - that's just not their strong point.
And: keep an eye on performance, and try to pick an ORM that allows you to change certain operations into e.g. stored procedures - Entity Framework 4 allows this. So if the deletes are killing you - maybe you just write a Delete stored proc for that table and handle that operation differently.
The question here covers your options fairly well. Basically you're limited to hand-crafting the HQL/SQL. It's something you want to do if you run into scalability problems, but if you do in my experience it can have a very large positive impact. In particular, it saves a lot of disk and network IO, so your scalability can take a big jump. Not something to do right away though: analyse then optimise.
Are there other ways to limit returned columns other than the ones I listed above?
NHibernate lets you add projections to your queries so you wouldn't need to use views or procs just to limit your columns.
For me this has only been an issue if the tables has LOTS of columns > 30 or if the column had alot of data for example a over 5000 character in a field.
The approach I have used is to just map another object to the existing table but with only the fields I need. So for a search that populates a table with 100 rows I would have a
MyObjectLite, but when I click to view the Details of that Row I would call a GetById and return a MyObject that has all the columns.
Another approach is to use custom SQL, Stroed procs but I only think you should go down this path if you REALLY need the performance gain and have users complaining. SO unless there is a performance problem do not waste your time trying to fix a problem that does not exist.
You can limit number of returned columns by using Projection and Transformers.AliasToBean and DTO here how it looks in Criteria API:
.SetProjection(Projections.ProjectionList()
.Add(Projections.Property("Id"), "Id")
.Add(Projections.Property("PackageName"), "Caption"))
.SetResultTransformer(Transformers.AliasToBean(typeof(PackageNameDTO)));
In LLBLGen Pro, you can return Typed Lists which not only allow you to define which fields are returned but also allow you to join data so you can pull a custom list of fields from multiple tables.
Overall, I agree that for most situations, this is premature optimization.
One of the big advantages of using LLBLGen and other ORMs as well (I just feel confident speaking about LLBLGen because I have used it since its inception) is that the performance of the data access has been optimized by folks who understand the issues better than your average bear.
Whenever they figure out a way to further speed up their code, you get those changes "for free" just by re-generating your data layer or by installing a new dll.
Unless you consider yourself an expert at writing data access code, ORMs probably improve most developers efficacy and accuracy.