Is lazy loading better than eager loading? - lazy-loading

I heard that lazy loading is better than eager loading? If yes then why? Want to know behind the reasons.

To make this objective, I will define what "better" is in my answer: I consider "better" in terms of lazy-loading and eager-loading to make a list of priorities and decide what to load eagerly and what to load lazily in a wise manner.
In general, you have n resources and there are quite a lot of things which contribute to one or the other:
slowness of the given feature
server load it causes
probability that the user will use that feature
importance of the feature in average to the user
So, if we are talking about a feature which is rarely used and users tend to be patient with that feature, Lazy loading is "better". However, if it's highly probable that the user will use it, then eager loading is "better". It boils down to the question of the importance for that feature/content to be ready against server load. This is a conflict of interest between the user who would like to have everything instantly and the server which has a limited power. It is advisable to regularly monitor server load and think about the UX of your software.

Related

Solution Cloning Performance Tips

We are currently trying to improve the performance of a planning problem we've implemented in OptaPlanner. Our model has ~45,000 chained variables and after profiling the application it seems like the main bottleneck is around the cloning. Approximately 90% of the CPU run-time is consumed by the FieldAccessingSolutionCloner method calls.
We've already tried to make our object model more lightweight by reducing the number of Maps and Sets within the PlanningEntities, changing fields to primitives where possible, but from your own OptaPlanner experience have you any advice about how speed up cloning performance?
Have you tried writing a custom cloner? See docs.
The default one needs to rely on reflection, so it's slower.
Also, the structure of your domain model influences how much you need to clone (regardless if you go custom or not):
If you delete your Solution and Planning Entities classes, do your other domain classes still compile?
If yes, then the clone is minimal. If no, it's not.

Scaling CakePHP Version 2.3.0

I'm beginning a new project using CakePHP. I like the "auto-magic" features, I think its a good fit for the project. I'm wondering about the potential to scale CakePHP to several million IP hits a day. and hundreds of thousands of database writes and reads a day. Also about 50,000 to 500,000 users, often with 3000 concurrently using the site. I'm making use of heavy stored procedures to offset this, and I'm accessing several servers including a load balancer.
I'm wondering about the computational time of some of the auto-magic and how well Cake is able to assist with session requests making many db hits. Has anyone has had success with cake running from a single server array setup with this level of traffic? I'm not using the cloud or a distributed database (yet). I'm really worried about potential bottlenecks with using this framework. I'm interested in advice from anyone who has worked with Cake in production. I've reseached, but I would love a second opinion. Thank you for your time.
This is not a problem but optimization is up to you.
There are different cache methods available you can implement, memcache, redis, full page caching... All of that is supported by cacke already. What you cache and where is up to you.
For searching you could try elastic search to speedup things
There are before dispatcher filters to by pass controller instantiation (you might want to do that in special cases, check the asset filter for example)
Use nginx not apache
Also I would not start with over optimizing and over-thinking this before any code is written, start well, think about caching but when you start to come across bottleneck analyse and fix them. Otherwise you'll waste a lot of time with over optimization before you even have written anything that works.
Cake itself is very fast. Just to proof the bullshit factor of these fancy benchmarks some frameworks do we did one using a dispatcher filter to "optimize" it and even beat Yii who seems to be pretty eager to show how fast it is, but benchmarks are pointless, specially in a huge project where so many human made fail can be introduced.

Concurrent page request comparisons

I have been hoping to find out what different server setups equate to in theory for concurrent page requests, and the answer always seems to be soaked in voodoo and sorcery. What is the approximation of max concurrent page requests for the following setups?
apache+php+mysql(1 server)
apache+php+mysql+caching(like memcached or similiar (still one server))
apache+php+mysql+caching+dedicated Database Server (2 servers)
apache+php+mysql+caching+dedicatedDB+loadbalancing(multi webserver/single dbserver)
apache+php+mysql+caching+dedicatedDB+loadbalancing(multi webserver/multi dbserver)
+distributed (amazon cloud elastic) -- I know this one is "as much as you can afford" but it would be nice to know when to move to it.
I appreciate any constructive criticism, I am just trying to figure out when its time to move from one implementation to the next, because they each come with their own implementation feat either programming wise or setup wise.
In your question you talk about caching and this is probably one of the most important factors in a web architecture r.e performance and capacity.
Memcache is useful, but actually, before that, you should be ensuring proper HTTP cache directives on your server responses. This does 2 things; it reduces the number of requests and speeds up server response times (if you have Apache configured correctly). This can also be improved by using an HTTP accelerator like Varnish and a CDN.
Another factor to consider is whether your system is stateless. By stateless, it usually means that it doesn't store sessions on the server and reference them with every request. A good systems architecture relies on state as little as possible. The less state the more horizontally scalable a system. Most people introduce state when confronted with issues of personalisation - i.e serving up different content for different users. In such cases you should first investigate using the HTML5 session storage (i.e store the complete user data in javascript on the client, obviously over https) or if the data set is smaller, secure javascript cookies. That way you can still serve up cached resources and then personalise with javascript on the client.
Finally, your stack includes a database tier, another potential bottleneck for performance and capacity. If you are only reading data from the system then again it should be quite easy to horizontally scale. If there are reads and writes, its typically better to separate the read write datasets into a separate database and have the read only in another. You can then use more relevant methods to scale.
These setups do not spit out a single answer that you can then compare to each other. The answer will vary on way more factors than you have listed.
Even if they did spit out a single answer, then it is just one metric out of dozens. What makes this the most important metric?
Even worse, each of these alternatives is not free. There is engineering effort and maintenance overhead in each of these. Which could not be analysed without understanding your organisation, your app and your cost/revenue structures.
Options like AWS not only involve development effort but may "lock you in" to a solution so you also need to be aware of that.
I know this response is not complete, but I am pointing out that this question touches on a large complicated area that cannot be reduced to a single metric.
I suspect you are approaching this from exactly the wrong end. Do not go looking for technologies and then figure out how to use them. Instead profile your app (measure, measure, measure), figure out the actual problem you are having, and then solve that problem and that problem only.
If you understand the problem and you understand the technology options then you should have an answer.
If you have already done this and the problem is concurrent page requests then I apologise in advance, but I suspect not.

Performance implications of changing NSManagedObject instances that I never intend to save

I have a CoreData-based application that retrieves data about past events from an SQLite persistence store. Once I have the past events my application does some statistical analysis to predict future events based on the data it has about past events. Once my application has made a prediction about future events I want to run another algorithm that does some evaluation of that prediction. I'm expecting to do a lot of these evaluations, so performance optimization for each evaluation is likely to be critical.
Now, all of the classes I need to represent my future event predictions exist in my data model, and I have NSManagedObject subclasses for most of the important entities. The easiest way for me to implement my algorithms is to "fill in" the results for future events based on the prediction, and then run my evaluation using NSManagedObject instances for both the past events and the predictions for future events. However, I typically don't want to save these future event predictions in my persistent store: Once I have performed my evaluation of the prediction I want to throw away the predictions and just keep the evaluation results. I can do this pretty easily, I think, by just sending the rollback: message to my managed object context once my evaluation is complete.
That will all work fine, and from a coding perspective it seems like it will be quite easy to implement. However, I am wondering if I should expect performance concerns making such heavy use of managed objects when I have no intention of ever saving the changes I'm making. Given that performance is likely to be a factor, does using NSManagedObject instances for this make sense? Surely all the things it's doing to keep track of changes and support things like undo and complex entity relationships come with some amount of overhead. Should I be concerned about this overhead?
I could of course create non-NSManagedObject classes that implement an optimized version of my model classes for use when making predictions and evaluating them. That would involve a lot of additional work, including the work necessary to copy data back and forth between the NSManagedObject instances for past events and the optimized class instances for future events: I'd rather not create that code if it is not needed.
Surely all the things it's doing to keep track of changes and support
things like undo and complex entity relationships come with some
amount of overhead.
Core Data doesn't have the overhead that people expect owing to its optimizations. In general, using managed objects in memory is as fast or faster than any custom objects and management code you write yourself.
Should I be concerned about this overhead?
Can't really say without implementation details but most likely not. You can hand tweak Core Data for specific circumstances to get better performance.
The best approach is always to start with the most simple solution and then move to a more complex only when testing reveals that the simple solution does not perform well.
Premature optimization is the root of all evil.

Is ORM slow? Does it matter? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I really like ORM as compared to store procedure, but one thing that I afraid is that ORM could be slow, because of layers and layers of abstraction. Will using ORM slow down my application? Or does it matter?
Yes, it matters. It is using more CPU cycles and consequently slowing your application down. Hear me out though...
But, consider this: what is more expensive? Server hardware or another programmer? Server hardware, generally, is cheaper than hiring another team of programmers. So, while ORM may be costing you CPU cycles, you need one less programmer to manage your SQL queries, often resulting in a lower net cost.
To determine if it's worth it for you, calculate or determine how many hours you saved by using an ORM. Then, figure out how much money you spent on the server to support ORM. Multiply the hours you saved by your hourly rate and compare to the server cost.
Of course, whether an ORM actually saves you time is a whole another debate...
Is ORM slow?
Not inherently. Some heavyweight ORMs can add a general drag to things but we're not talking orders of magnitude slowdown.
What does make ORM slow is naïve usage. If you're using an ORM because it looks easy and you don't know how the underlying relational data model works, you can easily write code that seems reasonable to an OO programmer, but will murder performance.
ORM is a handy tool, but you need the lower-level understanding (that usually comes from writing SQL queries) to go with it.
Does it matter?
If you end up performing a looped query for each of thousands of entities at once, instead of a single fast join, then certainly it can.
ORM's are slower and do add overhead to applications (unless you specifically know how to get around these problems, which is not very common). The database is the most critical element and Web applications should be designed around it.
Many OOP frameworks using Active Record or ORMs, developers in general - treat the database as an unimportant afterthought and tend to look at it as something they don't really need to learn. But performance and scalability usually suffer as the db is heavily taxed!
Many large scale web apps have fallen flat, wasting millions and months to years of time because they didn't recognize the importance of the database. Hundreds of concurrent users and tables with millions of records require database tuning and optimization. But I believe the problem is noticeable with a few users and less data.
Why are developers so afraid to learn proper SQL and tuning measures when it's the key to performance?
In a Windows Mobile 5 project against using SqlCe, I went from using hand-coded objects to code generated (CodeSmith) objects using an ORM template. In the process all my data access used CSLA as a base layer.
The straight conversion improved my performance by 32% in local testing, almost all of it a result of better access methods.
After that change, we adjusted the templates (after seeing some SqlCe performance stuff at PDC by Steve Lasker) and in less then 20 minutes, our entire data layer was greatly improved, our average 'slow' calls went from 460ms to ~20ms. The cool part about the ORM stuff is that we only had to implement (and unit test) these changes once and all the data access code got changed. It was an amazing time saver, we maybe saved 40 hours or more.
The above being said, we did lose some time by taking out a bunch of 'waiting' and 'progress' dialogs that were no longer needed.
I have used a few of the ORM tools, and I can recommend two of them:
.NET Tiers
CSLA codegen templates
Both of them have performed quite nicely and any performance loss has not been noticeable.
I've always found it doesn't matter. You should use whatever will make you the most productive, responsive to changes, and whatever is easiest to debug and maintain.
Most applications never need enough load for the difference between ORM and SPs to noticeable. And there are optimizations to make ORM faster.
Finally, a well-written app will have its data access seperated from everything else so that in the future switching from ORM to whatever would be possible.
Is ORM slow?
Yes ( compared with stored procedures )
Does it matter?
No ( except your concern is speed )
I think the problem is many people think of ORM as a object "trick" to databases, to code less or simplify SQL usage, while in reality is .. well an Object - To Relational ( DB ) - Mapping.
ORM is used to persist your objects to a relational database manager system, and not ( just ) to substitute or make SQL easier ( although it make a good job at that too )
If you don't have a good object model, or you're using to make reports, or even if you're just trying to get some information, ORM is not worth it.
If in the other hand you have a complex system modeled through objects were each one have different rules and they interact dynamically and you your concern is persist that information into the database rather than substitute some existing SQL scripts then go for ORM.
Yes, ORM will slow down your application. By how much depends on how far the abstraction goes, how well your object model maps to the database, and other factors. The question should be, are you willing to spend more developer time and use straight data access or trade less dev time for slower runtime performance.
Overall, the good ORMs have little overhead and, by and large, are considered well worth the trade off.
Yes, ORMs affect performance, whether that matters ultimately depends on the specifics of your project.
Programmers often love ORM because they like the nice front-end cding environments like Visual Studio and dislike coding raw SQL with no intellisense, etc.
ORMs have other limitations besides a performance hit--they also often do not do what you need 100% of the time, add the complexity of an additional abstraction layer that must be maintained and re-established every time chhnges are made, there are also caching issues to be dealt with.
Just a thought -- if the database vendors would make the SQL programming environment as nice as Visual Studio, and provide a more natural linkage between the db code and front-end code, we wouldn't need the ORMs...I guess things may go in that direction eventually.
Obvious answer: It depends
ORM does a good job of insulating a programmer from SQL. This in effect substitutes mediocre, computer generated queries for the catastrophically bad queries a programmer might give.
Even in the best case, an ORM is going to do some extra work, loading fields it doesn't need to, explicitly checking constraints, and so forth.
When these become a bottle-neck, most ORM's let you side-step them and inject raw SQL.
If your application fits well with objects, but not quite so easily with relations, then this can still be a win. If instead your app fits nicely around a relational model, then the ORM represents a coding bottleneck on top of a possible performance bottleneck.
One thing I've found to be particularly offensive about most ORM's is their handling of primary keys. Most ORM's require pk's for everything they touch, even if there is no concievable use for them. Example: Authors should have pk's, Blog posts SHOULD have pk's, but the links (join table) between authors and posts not.
I have found that the difference between "too slow" and "not too much slower" depends on if you have your ORM's 2nd level (SessionFactory) cache enabled. With it off it handles fine under development load, but will crush your system under mild production load. After turning on the 2nd Level cache the server handled the expected load and scaled nicely.
ORM can get an order of magnitude slower, not just on the grount=s of wasting a lot of CPU cycles on it's own but also using much more memeory which then has to be GC-d.
Much worse that that however is that the is no standard for ORM (unlike SQL) and that my and large ORM-s use SQL vary inefficiently so at the end of the day you still have to dig into SQL to fix per issues and every time an ORM makes a mess and you have to debug it. Meaning that you haven't gained anything at all.
It's terribly immature technology for real production-level applications. Very problematic things are handling indexes, foreign keys, tweaking tables to fit object hierarchies and terribly long transactions, which means much more deadlocks and repeats - if an ORM knows hows to handle that at all.
It actually makes servers less scalable which multiplies costs but these costs don't get mentioned at the begining - a little inconvenient truth :-) When something uses transactions 10-100 times bigger than optimal it becomes impossible to scale SQL side at all. Talking about serious systems again not home/toy/academic stuff.
An ORM will always add some overhead because of the layers of abstraction but unless it is a poorly designed ORM that should be minimal. The time to actually query the database will be many times more than the additional overhead of the ORM infrastructure if you are doing it correctly, for example not loading the full object graph when not required. A good ORM (nHibernate) will also give you many options for the queries run against the database so you can optimise as required as well.
Using an ORM is generally slower. But the boost in productivity you get will get your application up and running much faster. And the time you save can later be spent finding the portions of your application that are causing the biggest slow down - you can then spend time optimizing the areas where you get the best return on your development effort. Just because you've decided to use an ORM doesn't mean you can't use other techniques in the sections of code that can really benefit from it.
An ORM can be slower, but this is offset by their ability to cache data, therefore however fast the alternative, you can't get much faster than reading from memory.
I never really understood why people think that this is slower or that is slower... get a real machine I say. I have had mixed results... I've seen where execution time for a stored procedure is much slower than ORM and vise versa.. But in both cases the performance was due to difference in hardware.