How to increase perfomance for retrieving complex business object? [closed] - sql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Currently we have a complex business object which need around 30 joins on our sql database to retrieve one item. (and this is our main use case). The database is around 2Gb in sql server.
We are using entity framework to retrieve data and it takes around 3,5sec to retrieve one item. We haved noticed that using subquery in a parrallel invoke is more performant than using joins when there is a lot of rows in the other table. (so we have something like 10 subqueries). We don't use stored procedure because we would like to keep the Data Access Layer in "plain c#".
The goal would be to retrieve the item under 1sec without changing too much the environnement.
We are looking into no sql solutions (RavenDB, Cassandra, Redis with the "document client") and the new feature "in-memory database" of sql server.
What do you recommend ? Do you think that just one stored procedure call with EF would do the job ?
EDIT 1:
We have indexes on all columns where we are doing joins

In my opinion, if you need 30 joins to retrieve one item, it is something wrong with the design of your database. Maybe it is correct from the relational point of view but what is sure it is totally impractical from the funcional/performance point of view.
A couple of solutions came to my mind:
Denormalize your database design.
I am pretty sure that you can reduce the number of joins improving your performance a lot with that technique.
http://technet.microsoft.com/en-us/library/cc505841.aspx
Use a NoSQL solution like you mention.
Due to the quantity of SQL tables involved this is not going to be an easy change, but maybe you can start introducing NoSQL like a cache for this complex objects.
NoSQL Use Case Scenarios or WHEN to use NoSQL
Of course using stored procedures for this case in much better and it will improve the performance but I do not believe is going to make a dramatic change. You should try id and compare. Also revise all your indexes.

Related

should I create a counter column? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Optimization was never one of my expertise. I have users table. every user has many followers. So now I'm wondering if I should use a counter column in case that some user has a million followers. So instead of counting a whole table of relations, shouldn't I use a counter?
I'm working with SQL database.
Update 1
Right now I'm only writing the way I should build my site. I haven't write the code yet. I don't know if I'll have slow performance, that's why I'm asking you.
You should certainly not introduce a counter right away. The counter is redundant data and it will complicate everything. You will have to master the additional complexity and it'll slow down the development process.
Better start with a normalized model and see how it works. If you really run into performance problems, solve it then then.
Remember: premature optimization is the root of all evils.
It's generally a good practice to avoid duplication of data, such as summarizing one data point in another data's table.
It depends on what this is for. If this is for reporting, speed is usually not an issue and you can use a join.
If it has to do with the application and you're running into performance issues with join or computed column, you may want to consider summary table generated on a schedule.
If you're not seeing a performance issue, leave it alone.

SQL summing vs running totals [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm currently in disagreement with my colleague regarding the best design of our database.
We have a need to frequently access the total user balance from our database of transactions, we will potentially need to access this information several times a second.
He says that sql is fast and all we need to do is SUM() the transactions. I, on the other hand, believe that eventually with enough users and a large database our server will be spending most of its time summing the same records in the database. My solution is to have a separate table to keep a record of the totals.
Which one of us is right?
That is an example for database denormalization. It makes the code more complex and introduces potential for inconsistencies, but the query will be faster. If that's worth it depends on the need for the performance boost.
The sum could also be quite fast (i.e. fast enough) if it can be indexed properly.
A third way would be using cached aggregates that are periodically recalculated. Works best if you don't need real-time data (such as for account activity up until yesterday, which you can maybe augment with real-time data from the smaller set of today's data).
Again, the tradeoff is between making things fast and keeping things simple (don't forget that complexity also tend to introduce bugs and increase maintenance costs). It's not a matter of one approach being "right" for all situations.
I don't think that one solution fits all.
You can go very far with a good set of indexes and well written queries. I would start with querying real time until you can't, and then jump to the next solution.
From there, you can go to storing aggregates for all non changing data (for example, beginning of time up to prior month), and just query the sum for any data that changes in this month.
You can save aggregated tables, but how many different kinds of aggregates are you going to save? At some point you have to look into some kind of a multi dimensional structure.

Will denormalization improve performance in SQL? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I would like to speed up our SQL queries. I have started to read a book on Datawarehousing, where you have a separate database with data in different tables etc. Problem is I do not want to create a separate reporting database for each of our clients for a few reasons:
We have over 200, maintenance on these databases is enough
Reporting data must be available immediately
I was wondering, if i could simply denormalize the tables that i report on, as currently there are a lot of JOINs and believe these are expensive (about 20,000,000 rows in tables). If i copied the data into multiple tables, would this increase the performance by a far bit? I know there are issues with data being copied all over the place, but this could also be good for a history point of view.
Denormalization is no guarantee of an improvement in performance.
Have you considered tuning your application's queries? Take a look at what reports are running, identify places where you can add indexes and partitioning. Perhaps most reports only look at the last month of data - you could partition the data by month, so only a small amount of the table needs to be read when queried. JOINs are not necessarily expensive if the alternative is a large denormalized table that requires a huge full table scan instead of a few index scans...
Your question is much too general - talk with your DBA about doing some traces on the report queries (and look at the plans) to see what you can do to help improve report performance.
The question is very general. It is hard to answer whether denormalization will increase performance.
Basically, it CAN. But personally, I wouldn't consider denormalizing as a solution for Reporting issues. In my practice business people love to build huuuge reports which would kill OLTP DB in the least appropriate time. I would continue reading Datawarehousing :)
Yes for OLAP application your performance will improve by denormalization. but if you use same denormalized table for your OTLP application you will see a performance bottleneck over there. I suggest you too create new denormlize tables or materialized view for your reporting purpose and also you can incremently fast refresh your MV so you will get reporting data immediately.

Why don't databases have good full text indexes [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Why don't any of the major RDBMS systems like MySQL, SQL Server, Oracle, etc. have good full text indexing support?
I realize that most databases support full text indexes to some degree, but that they are usually slower, and with a smaller feature set. It seems that every time you want a really good full text index, you have to go outside the database and use something like Lucene/Solr or Sphinx.
Why isn't the technology in these full text search engines completely integrated into the database engine? There's lot of problems with keeping the data in another system such as Lucence, including keeping the data up to date, and the inability to join the results with other tables. Is there a specific technological reason why these two technologies can't be integrated?
RDBMS indexed serve a different purpose. They are there to offer the engine a way to optimize access to the data, both by the user and by the engine itself (to resolve joins, check foreign keys, etc...). As such they are really not a functional data structure.
Tools like full-text search, tag clouds may be very useful for enhancing the user experience. These serve only the user and applications. They are functional, and require real data structures... secondary tables or derived fields... with, typically, a whole lot of triggers and code to keep these updated.
And IMHO... there are many ways to implement these technologies. RDBMS producers would have to maybe choose some tech over another... for reasons that have nothing to do with the RDBMS engine itself. That does not really seem their job.

Does EF not using the same old concept of creating large query that leads to degrade performance? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I know this could be a stupid question, but a beginner I must ask this question to experts to clear my doubt.
When we use Entity Framework to query data from database by joining multiple tables it creates a sql query and this query is then fired to database to fetch records.
"We know that if we execute large query from .net code it will
increase network traffic and performance will be down. So instead of
writing large query we create and execute stored procedure and that
significantly increases the performance."
My question is - does EF not using the same old concept of creating large query that leads to degrade performance.
Experts please clear my doubts.
Thanks.
Contrary to popular myth, stored procedure are not any faster than a regular query. There are some slight, possible direct performance improvements when using stored procedures ( execution plan caching, precompiltion ) but with a modern caching environment and newer query optimizers and performance analysis engines, the benefits are small at best. Combine this with the fact that these potential optimization were already just a small part of the query results generation process, the most time-intensive part being the actual collection, seeking, sorting, merging, etc. of data, these stored procedure advantages are downright irrelevant.
Now, one other point. There is absolutely no way, ever, that by creating 500 bytes for the text of a query versus 50 bytes for the name of a stored procedure that you are going to have any effect on a 100 M b / s link.