Too many NamedQueries can be a memory problem? - eclipselink

Sometimes I think to optimization.
I have a lot of JPA entities, about 100, and their number will increase.
Each entity has at least 3 named query.
I read (but i don't remember where) that the queries are translated and stored as SQL at application's startup.
Now, does will be a cache over-usage having too many named query in my application?

Related

sqlite3 views affect performance?

Trying to understand databases better in general, and sqlite3 in particular:
Are views in sqlite3 mainly an organizational feature, allowing complex queries to be broken into a series of smaller ones; or do views actually affect the performance of queries that use them?
I noticed that views are stored in the database itself as part of the schema. Are views stored on disk, updated dynamically as dependent tables are updated; or are they evaluated on demand?
Thanks.
Views will always be executed on demand (sqlite3 or otherwise), so the results they return are never persistently stored.
As for performance, while I can't speak to sqlite3 specifically, usually using views will have slightly less overhead as the query parser/planner doesn't have to reparse the raw sql on each execution. It can parse it once, store its execution strategy, and then use that each time the query is actually run.
The performance boost you see with this will generally be small, in the grand scheme of things. It really only helps if its a fast query that you're executing frequently. If its a slow query you execute infrequently, the overhead associated with parsing the query is insignificant. Views do, of course, provide a level of organization which is nice.

Select * vs specific columns & loading object properties

I always thought SELECT * was bad and that you should always return only the columns you are going to use. One of the reasons for this is that the DB can return the result without hitting any tables if all the columns needed are in the index.
I have a factory class that loads the properties of a Product object. It loads all the properties everytime GetProduct is called etc.
Many of the pages won't be using all of the Product properties even though they will be loaded from the database because of the SELECT*.
Is there any design advice/guidelines on this?
The trade-off here is between squeaking out every last bit of potential performance versus code maintainability. There is no question that bringing back columns you won't use wastes some CPU cycles. The question becomes: how many? Then you have to consider what is more expensive, your wasted CPU cycles or your programmers' time for building and maintaining the code?
If you are working on a system with huge performance requirements then it may very well pay to optimize your ORM / factory code. On the other hand, if you're building a departmental line of business app and you've got scores or hundreds of ORM classes, maybe you are better off keeping it simple for the programmers (and the people who have to pay for them) and stop worrying about a few cycles. This becomes even more the case if you use a framework that scaffolds up most of your ORM code for you with code generation - like Entity Framework (or many others)...
If you are building your system without the use of any kind of code generating framework, and if your data access layer is pretty close to bare metal SQL then only bringing back what you need is good advice. If you are building an app that is going to be used by thousands or millions of people simultaneously, then by all means tune your SQL from the outset. If, on the other hand, you work in a shop that uses ORM frameworks and RAD or agile then writing dozens of SQLs is counter productive.
I'd definitely avoid SELECT *. Just retrieve the data you know you'll need. I'd prefer to write a dozen queries to the same table, where each one refers to just the few columns I need for a particular purpose, rather than write one query that retrieves all the columns and just use that everywhere.
Even if you know you need every column currently in a table, list each one explicitly. That way, if someone adds half a dozen more columns to the table in the future, all your old queries won't suddenly be retrieving more data than is needed.

Should I be concerned that ORMs, by default, return all columns?

In my limited experience in working with ORMs (so far LLBL Gen Pro and Entity Framework 4), I've noticed that inherently, queries return data for all columns. I know NHibernate is another popular ORM, and I'm not sure that this applies with it or not, but I would assume it does.
Of course, I know there are workarounds:
Create a SQL view and create models and mappings on the view
Use a stored procedure and create models and mappings on the result set returned
I know that adhering to certain practices can help mitigate this:
Ensuring your row counts are reasonably limited when selecting data
Ensuring your tables aren't excessively wide (large number of columns and/or large data types)
So here are my questions:
Are the above practices sufficient, or should I still consider finding ways to limit the number of columns returned?
Are there other ways to limit returned columns other than the ones I listed above?
How do you typically approach this in your projects?
Thanks in advance.
UPDATE: This sort of stems from the notion that SELECT * is thought of as a bad practice. See this discussion.
One of the reasons to use an ORM of nearly any kind is to delay a lot of those lower-level concerns and focus on the business logic. As long as you keep your joins reasonable and your table widths sane, ORMs are designed to make it easy to get data in and out, and that requires having the entire row available.
Personally, I consider issues like this premature optimization until encountering a specific case that bogs down because of table width.
First of : great question, and about time someone asked this! :-)
Yes, the fact an ORM typically returns all columns for a database table is something you need to take into consideration when designing your systems. But as you've mentioned - there are ways around this.
The main fact for me is to be aware that this is what happens - either a SELECT * FROM dbo.YourTable, or (better) a SELECT (list of all columns) FROM dbo.YourTable.
This is not a problem when you really want the whole object and all its properties, and as long as you load a few rows, that's fine, too - the convenience beats the raw performance.
You might need to think about changing your database structures a little bit - things like:
maybe put large columns like BLOBs into separate tables with a 1:1 link to your base table - that way, a select on the parent tables doesn't grab all those large blobs of data
maybe put groups of columns that are optional, that might only show up in certain situations, into separate tables and link them - again, just to keep the base tables lean'n'mean
Also: avoid trying to "arm-wrestle" your ORM into doing bulk operations - that's just not their strong point.
And: keep an eye on performance, and try to pick an ORM that allows you to change certain operations into e.g. stored procedures - Entity Framework 4 allows this. So if the deletes are killing you - maybe you just write a Delete stored proc for that table and handle that operation differently.
The question here covers your options fairly well. Basically you're limited to hand-crafting the HQL/SQL. It's something you want to do if you run into scalability problems, but if you do in my experience it can have a very large positive impact. In particular, it saves a lot of disk and network IO, so your scalability can take a big jump. Not something to do right away though: analyse then optimise.
Are there other ways to limit returned columns other than the ones I listed above?
NHibernate lets you add projections to your queries so you wouldn't need to use views or procs just to limit your columns.
For me this has only been an issue if the tables has LOTS of columns > 30 or if the column had alot of data for example a over 5000 character in a field.
The approach I have used is to just map another object to the existing table but with only the fields I need. So for a search that populates a table with 100 rows I would have a
MyObjectLite, but when I click to view the Details of that Row I would call a GetById and return a MyObject that has all the columns.
Another approach is to use custom SQL, Stroed procs but I only think you should go down this path if you REALLY need the performance gain and have users complaining. SO unless there is a performance problem do not waste your time trying to fix a problem that does not exist.
You can limit number of returned columns by using Projection and Transformers.AliasToBean and DTO here how it looks in Criteria API:
.SetProjection(Projections.ProjectionList()
.Add(Projections.Property("Id"), "Id")
.Add(Projections.Property("PackageName"), "Caption"))
.SetResultTransformer(Transformers.AliasToBean(typeof(PackageNameDTO)));
In LLBLGen Pro, you can return Typed Lists which not only allow you to define which fields are returned but also allow you to join data so you can pull a custom list of fields from multiple tables.
Overall, I agree that for most situations, this is premature optimization.
One of the big advantages of using LLBLGen and other ORMs as well (I just feel confident speaking about LLBLGen because I have used it since its inception) is that the performance of the data access has been optimized by folks who understand the issues better than your average bear.
Whenever they figure out a way to further speed up their code, you get those changes "for free" just by re-generating your data layer or by installing a new dll.
Unless you consider yourself an expert at writing data access code, ORMs probably improve most developers efficacy and accuracy.

Coldfusion ORM Large Tables

Say if I have a large dataset, the table has well over a million records and the database is normalized so theres foreign keys and stuff. Ive set up the relations properly and i get a list of the first object applications = EntityLoad("entityName") but because of the relations and stuff the page takes like 24 seconds to load, even when i limit the number of records to show to like 5 it takes an awful long time to load.
My solution to this was create another object that just gets the list, and then when the user wants to , use the object with all the relations and show it to the user. Is this the right way to approach it, or am i missing a big ORM concept?
Are you counting just the time to get the data, or are you perhaps doing a CFDUMP on it or something else visually that could be slow. In other words, have you wrapped the EntityLoad by itself in a cftimer tag to be sure that it is the culprit?
The first thing I would do is enable SQL logging in your Application.cfc. Add logSQL=true to This.ormSettings.
That should allow you to grab the SQL that ORM generates. Run it in an analyzer. See if the ORM SQL is doing somethign crazy. See if it is an index that you missed or something.
Also are you doing paging as Ray talks about here: http://www.coldfusionjedi.com/index.cfm/2009/8/14/Simple-ColdFusion-9-ORM-Paging-Demo?
If not have you tried using ORMExecuteQuery and HQL to enable paging.
Those are my thoughts.
When defining complex domain models with Hibernate - you will sometimes need to tweak the mapping to improve performance. This is especially true if you are dealing with inheritance (not sure how much inheritance is in your model). The ultimate goal is to have your query pulling from as few tables as possible while still preserving your domain model. This might require using the advanced inheritance mappings (more on that in a sec).
LOGGING SQL
As Terry mentioned, you will want to be sure you can log the actual SQL that is being passed to your database (yeah, you don't totally get away from SQL with ORM). Here is a great article on setting up logging for Hibernate in CF9 from Rupesh:
http://www.rupeshk.org/blog/index.php/2009/07/coldfusion-orm-how-to-log-sql/
HIBERNATE MAPPING FILES
Anytime you want to do something beyond the basic, you want to be sure that you are looking at the actual Hibernate mapping files that are generated for your CFC's. Be sure to set the following with all of your hibernate options in Application.cfc:
savemapping = true
While the cfproperty properties allow you to define many aspects of the mapping, there are actually some things that can only be done in the Hibernate mapping files (and there are tons of community resources on this.
INHERITANCE MAPPING
As I mentioned earlier, Hibernate provides different inheritance strategies for mapping. They are Table per Hierarchy, Table per subclass, Table per concrete class, and implicit polymorphism. You can read more about these types in the CF9 docs under Advanced Mapping > Inheritance Mapping or in the Hibernate documentation (as it would take forever to explain each of these).
Knowing how your tables are mapped is very important with inheritance (and it is also where Hibernate can generate some HUGE queries if you don't tweak your setup).
Those are the things I can think of - if you can give some additional information about your domain model - we can look to see what other things might be done to tweak it.
There is a good chance Hibernate is doing it's caching thing. A fair comparison in my mind (everyone please feel free to add) is doing an:
EntityLoad("entity_name") is the same as doing a select * from TABLE
So, in this case, what Hibernate might be doing in instantiating the memory, and caching it a certain way, your database server might do this similarly when you sent such a broad SQL instruction.
I have been extremely interested in ORM the past few weeks and it looks to be a very rewarding undertaking.
For this reason, is there a tiem you would ever load all 500,000 records as a result? I assume not.
I have one large logging table that I will be attacking, I am finding that the SQL good stuff must be there. For example, mark the fields that are indexes as such, this will speed it up incredibly when searching. I am sure the ORM can handle this.
Beyond this:
Find some excellent Hibernate forums, resources, and tutorials so you can learn Hibernate. This isn't really as much a Coldfusion --> ORM issue as what Hibernate might do on it's own. I have ordered a few Hibernate books that I'm waiting on to see how they are.
Likewise there seems to be an incredible amount of Hibernate resources out there where you can bring the Performance enhancement solutions of Hibernate into the Coldfusion sphere. I might be making it too simple, but I see the CF-ORM implementation as a wrapper with some code generation to save us time.
Take a look at implementing filters to cut down your data in the EntityLoad() call.
As recommended in other threads, turn on sql logging and see what sql is being generated. Chances are it might not be what you need. Check out HQL to see if you can form a better statement.
Most importantly, share what you find. I'll volunteer to do the same on this as you've tempted me to go try this out in my spare time a bit sooner than planned.
Faisal, we ran into this with Linq (c# orm).
Our solution was to create simple objects not holding the relational data. For instance, along with Users we had a SimpleUsers object which held little or no relation to any other object and had a limited set of columns.
There could be other ways of handling this but this approach helped tremendously with the query speed.

SQL Server Views, blessing or curse? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I once worked with an architect who banned the use of SQL views. His main reason was that views made it too easy for a thoughtless coder to needlessly involve joined tables which, if that coder tried harder, could be avoided altogether. Implicitly he was encouraging code reuse via copy-and-paste instead of encapsulation in views.
The database had nearly 600 tables and was highly normalised, so most of the useful SQL was necessarily verbose.
Several years later I can see at least one bad outcome from the ban - we have many hundreds of dense, lengthy stored procs that verge on unmaintainable.
In hindsight I would say it was a bad decision, but what are your experiences with SQL views? Have you found them bad for performance? Any other thoughts on when they are or are not appropriate?
There are some very good uses for views; I have used them a lot for tuning and for exposing less normalized sets of information, or for UNION-ing results from multiple selects into a single result set.
Obviously any programming tool can be used incorrectly, but I can't think of any times in my experience where a poorly tuned view has caused any kind of drawbacks from a performance standpoint, and the value they can provide by providing explicitly tuned selects and avoiding duplication of complex SQL code can be significant.
Incidentally, I have never been a fan of architectural "rules" that are based on keeping developers from hurting themselves. These rules often have unintended side-effects -- the last place I worked didn't allow using NULLs in the database, because developers might forget to check for null. This ended up forcing us to work around "1/1/1900" dates and integers defaulted to "0" in all the software built against the databases, and introducing a litany of bugs caused by devs working around places where NULL was the appropriate value.
You've answered your own question:
he was encouraging code reuse via copy-and-paste
Reuse the code by creating a view. If the view performs poorly, it will be much easier to track down than if you have the same poorly performing code in several places.
Not a big fan of views (Can't remember the last time I wrote one) but wouldn't ban them entirely either. If your database allows you to put indexes on the views and not just on the table, you can often improve performance a good bit which makes them better. If you are using views, make sure to look into indexing them.
I really only see the need for views for partitioning data and for extremely complex joins that are really critical to the application (thinking of financial reports here where starting from the same dataset for everything might be critical). I do know some reporting tools seem to prefer views over stored procs.
I am a big proponent of never returning more records or fields than you need in a specific instance and the overuse of views tends to make people return more fields (and in way too many cases, too many joins) than they need which wastes system resources.
I also tend to see that people who rely on views (not the developer of the view - the people who only use them) often don't understand the database very well (so they would get the joins wrong if not using the view) and that to me is critical to writing good code against the database. I want people to understand what they are asking the db to do, not rely on some magic black box of a view. That is all personal opinion of course, your mileage may vary.
Like BlaM I personally haven't found them easier to maintain than stored procs.
Edited in Oct 2010 to add:
Since I orginally wrote this, I have had occasion to work with a couple of databases designed by people who were addicted to using views. Even worse they used views that called views that called views (to the point where eventually we hit the limit of the number of tables that can be called). This was a performance nightmare. It took 8 minutes to get a simple count(*) of the records in one view and much longer to get data. If you use views, be very wary of using views that call other views. You will be building a system that will very probably not work under the normal performance load on production. In SQL Server you can only index views that do not call other views, so what ends up happening when you use views in a chain, is that the entire record set has to be built for each view and it is not until you get to the last one that the where clause criteria are applied. You may need to generate millions of records just to see three. You may join to the same table 6 times when you really only need to join to it once, you may return many many more columns than you need in the final results set.
My current database was completely awash with countless small tables of no more than 5 rows each. Well, I could count them but it was cluttered. These tables simply held constant type values (think enum) and could very easily be combined into one table. I then made views that simulated each of the tables I deleted to ensure backward compactability. Worked great.
One thing that hasn't been mentioned thus far is use of views to provide a logical picture of the data to end users for ad hoc reporting or similar.
This has two merits:
To allow the user to single "tables" containing the data they expect rather requiring relatively non technical users to work out potentially complex joins (because the database is normalised)
It provides a means to allow some degree of ah hoc access without exposing the data or the structure to the end users.
Even with non ad-hoc reporting its sometimes signicantly easier to provide a view to the reporting system that contains the relveant data, neatly separating production of data from presentation of same.
Like all power, views have its own dark side. However, you cannot blame views for somebody writing bad performing code. Moreover views can limit the exposure of some columns and provide extra security.
Views are good for ad-hoc queries, the kind that a DBA does behind the scenes when he/she needs quick access to data to see what's going on with the system.
But they can be bad for production code. Part of the reason is that it's sort of unpredictable what indexes you will need with a view, since the where clause can be different, and therefore hard to tune. Also, you are generally returning a lot more data than is actually necesary for the individual queries that are using the view. Each of these queries could be tightened up and tuned individually.
There are specific uses of views in cases of data partitioning that can be extremely useful, so I'm not saying they should avoided altogether. I'm just saying that if a view can be replaced by a few stored procedures, you will be better off without the view.
We use views for all of our simple data exports to csv files. This simplifies the process of writing a package and embedding the sql within the package which becomes cumbersome and hard to debug against.
Using views, we can execute a view and see exactly what was exported, no cruft or unknowns. It greatly helps in troubleshooting problems with improper data exports and hides any complex joins behind the view. Granted, we use a very old legacy system from a TERMS based system that exports to sql, so the joins are a little more complex than usual.
Some time ago I've tried to maintain code that used views built from views built from views... That was a pain in the a**, so I got a little allergic to views :)
I usually prefer working with tables directly, especially for web applications where speed is a main concern. When accessing tables directly you have the chance to tweak your SQL-Queries to achieve the best performance. "Precompiled"/cached working plans might be one advantage of views, but in many cases just-in-time compilation with all given parameters and where clauses in consideration will result in faster processing over all.
However that does not rule out views totally, if used adequately. For example you can use a view with the "users" table joined with the "users_status" table to get an textual explanation for each status - if you need it. However if you don't need the explanation: use the "users" table, not the view. As always: Use your brain!
Views have been helpful to us in their role for use by public web based applications that dip from a production database. Simplified security is the primary advantage we see since the table design in the database may combine sensitive and non-sensitive data within the same table. A stored procedure shares much of this advantage, but the view is read-only, has potential interop advantages, and is a less complex thing for junior people to implement.
This security abstraction advantage also applies when views are used for end-user ad-hoc queries; this would be less of an advantage if we had a proper, flattened, data warehouse representation of our data.
From an application stand point which uses an ORM, it's a lot harder to execute a custom query than doing a select on a discretely mapped type (eg, the view).
For example, if you need just 5 fields of a table that has many (say 30 or 40) an ORM framework will create an entity to represent the table.
That means that even though you only need a few properties of the entity, the select query generated by the ORM framework will bring the entire entity in its full glory. A view on the other hand, although also mapped to an entity with the ORM framework, will only bring the data you need.
Second, since ORM frameworks map entities to tables, relationships between entities are generated (and hydrated) on the client side, meaning that the query has to execute and return to the app before linking of those entities can happen at runtime within the app.
Some frameworks bypass that by returning the data from multiple linked entities in a giant select (with multiple joins), bringing in the columns of all related tables in one call. Internally the framework disassembles the giant result set and structures the logical presentation of the linked entities before returning those entities to the caller app.
Point being is that views are a life saver for apps using ORM. The alternative is to manually make db calls, and manually passing the resulting recordsets into usable entities/models.
While this approach is good and definitely produces a result, it has lots of negative facets. Manual code... is manual; hard to maintain, cumbersome in implementation, and causes devs to worry more about the specifics of the DB provider API vs the logical domain model. Not to mention that it increases time to production (its a lot more labourious) costs for development, maintenance, surface area of bugs, etc.
So for anyone saying views are bad, please consider the other side of things; The stuff the high and mighty DBA's most often have no clue about.
Let's see if I can come up with a lame analogy ...
"I don't need a phillips screwdriver. I carry a flat head and a grinder!"
Dismissing views out of hand will cause pain long term. For one, it's easier to debug and modify a single view definition than it is to ship modified code.
Views can also reduce the size of complex queries (in the same way stored procs can).
This can reduce network bandwith for very busy databases.