Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I was recently in an interview and i have been asked a question that was :
After one year of publishing your application the data in the database became massive , so what is the best way to optimize the DB performance in the DB side not in the coding side whether database is Oracle or SQL server ... i just want to know what is the best answer for this question ?
I can give you an answer, but can't guarantee that an interviewer would like it.
The best way to optimise the performance is to understand what your application does, and the data structures that the system provides. You must understand the business so that you can understand the data, and when you do that you'll know whether the SQL submitted to the system is "asking the correct question", and doing so in a way that makes sense for the data and it's distribution.
Furthermore, you should measure and document what the normal behaviour of the system is, and the cycles it might go through on a daily, weekly, monthly, quarterly and annual basis. You should be prepared to be able to quantify any deviation from normal performance.
You must understand the database technology itself. The concepts, the memory structures and processing, REDO, UNDO, index and table types, and maybe partitioning, parallelism, and RAC. The upsides and the downsides.
You must know SQL extremely well, and be completely up to date on its capabilities in your DB version, and any new ones now available. You must be able to read a raw execution plan straight from DBMS_XPlan(). Tracing query execution must be within your skill set.
You must understand query transformation and optimisation, the use of bind variables, and statistics.
If I had to choose only one of the above, it would be that you must have measured and documented historical performance, and be able to quantify deviations from it, because without that you will never know where to start.
I'm pretty sure the point of the question was to see how you deal with vague, overly broad questions. One thing you did that was pretty positive, was to seek out authoritative answers on SO. Don't know if that's going to help you now that the interview is done.
So - how do you respond to such a question? An "I have no way of knowing" is probably not the approach to take - even if it the correct answer.
Maybe something like, "I'm not entirely sure what you're asking - so let me try to understand with a couple of questions. Are we talking about query performance or update performance? Are there indexes to support the workload? What makes you feel optimization is necessary?"
I think it is as much about your approach to problem solving as any particular tech.
But, then on the other hand, maybe I'm wrong. Maybe the first answer is always "Index the hell out of it!" :-D
Interviewing is a nightmare, isn't it?
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Optimization was never one of my expertise. I have users table. every user has many followers. So now I'm wondering if I should use a counter column in case that some user has a million followers. So instead of counting a whole table of relations, shouldn't I use a counter?
I'm working with SQL database.
Update 1
Right now I'm only writing the way I should build my site. I haven't write the code yet. I don't know if I'll have slow performance, that's why I'm asking you.
You should certainly not introduce a counter right away. The counter is redundant data and it will complicate everything. You will have to master the additional complexity and it'll slow down the development process.
Better start with a normalized model and see how it works. If you really run into performance problems, solve it then then.
Remember: premature optimization is the root of all evils.
It's generally a good practice to avoid duplication of data, such as summarizing one data point in another data's table.
It depends on what this is for. If this is for reporting, speed is usually not an issue and you can use a join.
If it has to do with the application and you're running into performance issues with join or computed column, you may want to consider summary table generated on a schedule.
If you're not seeing a performance issue, leave it alone.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm currently in disagreement with my colleague regarding the best design of our database.
We have a need to frequently access the total user balance from our database of transactions, we will potentially need to access this information several times a second.
He says that sql is fast and all we need to do is SUM() the transactions. I, on the other hand, believe that eventually with enough users and a large database our server will be spending most of its time summing the same records in the database. My solution is to have a separate table to keep a record of the totals.
Which one of us is right?
That is an example for database denormalization. It makes the code more complex and introduces potential for inconsistencies, but the query will be faster. If that's worth it depends on the need for the performance boost.
The sum could also be quite fast (i.e. fast enough) if it can be indexed properly.
A third way would be using cached aggregates that are periodically recalculated. Works best if you don't need real-time data (such as for account activity up until yesterday, which you can maybe augment with real-time data from the smaller set of today's data).
Again, the tradeoff is between making things fast and keeping things simple (don't forget that complexity also tend to introduce bugs and increase maintenance costs). It's not a matter of one approach being "right" for all situations.
I don't think that one solution fits all.
You can go very far with a good set of indexes and well written queries. I would start with querying real time until you can't, and then jump to the next solution.
From there, you can go to storing aggregates for all non changing data (for example, beginning of time up to prior month), and just query the sum for any data that changes in this month.
You can save aggregated tables, but how many different kinds of aggregates are you going to save? At some point you have to look into some kind of a multi dimensional structure.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Currently we have a complex business object which need around 30 joins on our sql database to retrieve one item. (and this is our main use case). The database is around 2Gb in sql server.
We are using entity framework to retrieve data and it takes around 3,5sec to retrieve one item. We haved noticed that using subquery in a parrallel invoke is more performant than using joins when there is a lot of rows in the other table. (so we have something like 10 subqueries). We don't use stored procedure because we would like to keep the Data Access Layer in "plain c#".
The goal would be to retrieve the item under 1sec without changing too much the environnement.
We are looking into no sql solutions (RavenDB, Cassandra, Redis with the "document client") and the new feature "in-memory database" of sql server.
What do you recommend ? Do you think that just one stored procedure call with EF would do the job ?
EDIT 1:
We have indexes on all columns where we are doing joins
In my opinion, if you need 30 joins to retrieve one item, it is something wrong with the design of your database. Maybe it is correct from the relational point of view but what is sure it is totally impractical from the funcional/performance point of view.
A couple of solutions came to my mind:
Denormalize your database design.
I am pretty sure that you can reduce the number of joins improving your performance a lot with that technique.
http://technet.microsoft.com/en-us/library/cc505841.aspx
Use a NoSQL solution like you mention.
Due to the quantity of SQL tables involved this is not going to be an easy change, but maybe you can start introducing NoSQL like a cache for this complex objects.
NoSQL Use Case Scenarios or WHEN to use NoSQL
Of course using stored procedures for this case in much better and it will improve the performance but I do not believe is going to make a dramatic change. You should try id and compare. Also revise all your indexes.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
There is a lot of articles on the web supporting the trend to move to a graph database like Neo4j... but I can't find much against them.
When would a graph database not be the best solution?
Any links to articles that compare graphs, nosql, and relational databases would be great.
Currently I would not use Neo4j in a high volume write situation. The writes are still limited to a single machine, so you're restricted to a single machine's throughput, until they figure out some way of sharding (which is, by the way, in the works). In high volume write situations, you would probably look at some other store like Cassandra or MongoDB, and sacrifice other benefits a graph database gives you.
Another thing I would not currently use Neo4j for is full-text search, although it does have some built-in facility (as it uses Lucene for indexing under the hood), it is limited in scope and difficult to use from the latest Cypher. I understand that this is going to be improving rapidly in the next couple of releases, and look forward to that. Something like ElasticSearch or Solr would do a better job for FTS-related things.
Contrary to popular belief, tabular data is often well-fitted to the graph, unless you really have very denormalized data, like log records.
The good news is you can take advantage of many of these things together, picking the best tool for the job, and implement a polyglot persistence solution to answer your questions the best way possible.
Also, I would not use neo4j for serving and storing binary data. There are much better options for images, videos and large text documents out there - use them either as indexes with Neo4j, or just reference them.
When would a graph database not be the best solution?
When you work in a conservative company.
Insert some well thought-out technical reason here.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am a student, with decent knowledge of SQL, but have had very little to do with triggers in the past. I've looked at a few sites for guidance, but comprehensive explanation on all commonly used statements seems fairly sparse.
Is there a 'definitive' site for this kind of thing? Perhaps like a w3chools for advanced SQL?
Once you know a little SQL, try to check out Joe Celko's books. Advanced SQL Programming has a short section on triggers. Since you're a student, you can probably get a copy at the library. If you think you're going to be doing deeper SQL dev work, you'll be glad to score your own personal copy of the book. You can get the relational DB engine to do a significant amount of work in a small amount of code - thinking that way will make you a much more efficient programmer. Most book stores (my local Borders always has a couple copies) will have a copy on the shelf, so browse before you buy.
Also, check out the online manuals for the database you're using as itsmatt suggests.
I've always thought that the SQL Server Books Online (installed with SQL Server) were a good source of info.
This sounds a bit like an "old shoe or glass bottle" question.
Triggers are one of those things that you should really stay away from unless you really really know what you're doing and have a very good reason for doing what you're doing. So naturally, one of the prerequisites to ever using a trigger is that you should have a thorough understanding of how they work and their implications. Thus, you can see how the idea of an "Intro to Triggers" text may sound like a very dangerous thing to some people.
So my advice, cruel as it may sound, is this: If you're the sort of person who needs an intro text on this particular topic, then you might be better served in the long run by simply avoiding Triggers for the time being.