Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Optimization was never one of my expertise. I have users table. every user has many followers. So now I'm wondering if I should use a counter column in case that some user has a million followers. So instead of counting a whole table of relations, shouldn't I use a counter?
I'm working with SQL database.
Update 1
Right now I'm only writing the way I should build my site. I haven't write the code yet. I don't know if I'll have slow performance, that's why I'm asking you.
You should certainly not introduce a counter right away. The counter is redundant data and it will complicate everything. You will have to master the additional complexity and it'll slow down the development process.
Better start with a normalized model and see how it works. If you really run into performance problems, solve it then then.
Remember: premature optimization is the root of all evils.
It's generally a good practice to avoid duplication of data, such as summarizing one data point in another data's table.
It depends on what this is for. If this is for reporting, speed is usually not an issue and you can use a join.
If it has to do with the application and you're running into performance issues with join or computed column, you may want to consider summary table generated on a schedule.
If you're not seeing a performance issue, leave it alone.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I want to which is, in your opinion, the best practice for name SQL columns.
Example: Let's say that i have two columns name referenceTransactionId and source
now there is another way to write this like refTxnId and src
which one is the better way and why? Is it because of the memory usage or because of the readability?
Although this is a matter of opinion, I am going to answer anyway. If you are going to design a new database, write out all the names completely. Why?
The name is unambiguous. You'll notice that sites such as Wikipedia spell out complete names, as do standards such as time zones ("America/New_York").
Using a standard like the complete name means that users don't have to "think" about what the column might be called.
Nowadays, people type much faster than they used to. For those that don't, type ahead and menus provide assistance.
Primary keys and foreign keys, to the extent possible, should have the same name. So, I suspect that referenceTransactionId should simply be transactionId if it is referencing the Transactions table.
This comes from the "friction" of using multiple databases and having to figure out what a column name is.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm rewokring a huge SQL query (2k lines), containing lots of CASE where each CASE is another query. I'd like to know what to do and what to avoid, in terms of performance.
Should I make a bigger general query, that JOINs everything I'll need, and then, condition each CASE with the things I joined.
Or do I write a new query for each CASE? (with most subqueries from CASEs using the same tables).
I've also seen subqueries with an AS at the end, to use the resulting datas in the select or in conditions.
And WITH, before the SELECT, for mostly the same effect, creating a kind of table for conditions and display.
Which one is better to use in terms of performance?
Thanks
First of all, look if some CASE or subselect is repeated several times, do a WITH containing that part of the query and then replace it in the code so you only do the select one time in the with.
Second of all, try to avoid subselects as much as you can, again using WITH is a good way to do that.
Try to clarify each CASE so it's easy to read in vertical to avoid making mistakes when modifying it if necessary.
Remind that if the WITH is too big is possible that you consume all the memory so the query won't work, so add there the most repetitive sentences, but not all of them.
If possible, try to make the big query a lot of little querys splitted and add them all into a package so it's easier to keep track of the error and a good control of the process.
edit:
All this guessing you're using Oracle!!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Apologies in advance if this is a stupid question. I've more or less just started learning how to use SQL.
I'm making a website, the website stores main accounts, each having many sub-accounts associated with them. Each sub-account has a few thousand records in various tables associated with it.
My question is to do with the conventional usage of databases. Is it better to use a database per main account with everything associated with it stored in the same place, store everything in one database, or an amalgamation of both?
Some insight would be much appreciated.
Will you need to access more than one of these databases at the same time? If so put them all in one. You will not like the amount of effort and cost 'joining' them back together to do a query. On top of that, every database you have needs to be managed, and should you need to transfer data between them that can get painful as well.
Segregating data by database is a last resort.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm currently in disagreement with my colleague regarding the best design of our database.
We have a need to frequently access the total user balance from our database of transactions, we will potentially need to access this information several times a second.
He says that sql is fast and all we need to do is SUM() the transactions. I, on the other hand, believe that eventually with enough users and a large database our server will be spending most of its time summing the same records in the database. My solution is to have a separate table to keep a record of the totals.
Which one of us is right?
That is an example for database denormalization. It makes the code more complex and introduces potential for inconsistencies, but the query will be faster. If that's worth it depends on the need for the performance boost.
The sum could also be quite fast (i.e. fast enough) if it can be indexed properly.
A third way would be using cached aggregates that are periodically recalculated. Works best if you don't need real-time data (such as for account activity up until yesterday, which you can maybe augment with real-time data from the smaller set of today's data).
Again, the tradeoff is between making things fast and keeping things simple (don't forget that complexity also tend to introduce bugs and increase maintenance costs). It's not a matter of one approach being "right" for all situations.
I don't think that one solution fits all.
You can go very far with a good set of indexes and well written queries. I would start with querying real time until you can't, and then jump to the next solution.
From there, you can go to storing aggregates for all non changing data (for example, beginning of time up to prior month), and just query the sum for any data that changes in this month.
You can save aggregated tables, but how many different kinds of aggregates are you going to save? At some point you have to look into some kind of a multi dimensional structure.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Currently we have a complex business object which need around 30 joins on our sql database to retrieve one item. (and this is our main use case). The database is around 2Gb in sql server.
We are using entity framework to retrieve data and it takes around 3,5sec to retrieve one item. We haved noticed that using subquery in a parrallel invoke is more performant than using joins when there is a lot of rows in the other table. (so we have something like 10 subqueries). We don't use stored procedure because we would like to keep the Data Access Layer in "plain c#".
The goal would be to retrieve the item under 1sec without changing too much the environnement.
We are looking into no sql solutions (RavenDB, Cassandra, Redis with the "document client") and the new feature "in-memory database" of sql server.
What do you recommend ? Do you think that just one stored procedure call with EF would do the job ?
EDIT 1:
We have indexes on all columns where we are doing joins
In my opinion, if you need 30 joins to retrieve one item, it is something wrong with the design of your database. Maybe it is correct from the relational point of view but what is sure it is totally impractical from the funcional/performance point of view.
A couple of solutions came to my mind:
Denormalize your database design.
I am pretty sure that you can reduce the number of joins improving your performance a lot with that technique.
http://technet.microsoft.com/en-us/library/cc505841.aspx
Use a NoSQL solution like you mention.
Due to the quantity of SQL tables involved this is not going to be an easy change, but maybe you can start introducing NoSQL like a cache for this complex objects.
NoSQL Use Case Scenarios or WHEN to use NoSQL
Of course using stored procedures for this case in much better and it will improve the performance but I do not believe is going to make a dramatic change. You should try id and compare. Also revise all your indexes.