Hi all i'm completely new to maintenance tasks on SQL Server. I've set up a datawharehouse, that basically reads a load of xml files and imports this data into several tables using an SSIS. Now i've set indexes on the tables concerned and optimized my ssis. However i know that i should perform some maintenance tasks but i dont really know where to begin. We are talking about quite a bit of data, we are keeping data for up to 6 months and so far we have 3 months worth of data and the database is currently 147142.44 MB with roughly 57690230 rows in the main table. So it could easily double in size. Just wondering what your recommendations are?
While there is the usual index rebuild and statistics update which are part of normal maintenance, I would look at all of the currently long running queries and try to do some index tuning, before the data size grows. Resizing the database also forms part of a normal maintenance plan, if you can predict the growth and allocate enough space between maintenance runs then you can avoid the performance hit of space auto allocation (which will always happen at the worst possible time)
Related
I have a stored procedure in AZURE DW which runs very slow. I copied all the tables and the sp to a different server and there it is taking very less time to execute. I have created the tables using HASH distribution on the unique field but then also the sp is running very slow. Please advice how can I improve the performance of the sp in AZURE DW.
From your latest comment, the data sample is way too small for any reasonable tests on SQL DW. Remember SQL DW is MPP while your local on-premises SQL Server is SMP. Even with DWU100, the underlying layout of this MPP architecture is very different from your local SQL Server. For instance, every SQL DW has 60 user databases powering the DW and data is spread across them. Default storage is clustered column store which is optimized for common DW type workloads.
When a query is sent to DW, it has to build a distributed query plan that is pushed to the underlying DBs to build a local plan then executes and runs it back up the stack. This seems like a lot and it is for small data sets and simple queries. However, when you are dealing with hundreds of TBs of data with billions of rows and you need to run complex aggregations, this additional overhead is relatively tiny. The benefits you get from the MPP processing power makes that inconsequential.
There's no hard number on the actual size where you'll see real gains but at least half a TB is a good starting point and rows really should be in the tens of millions. Of course, there are always edge cases where your data set might not be huge but the workload naturally lends itself to MPP so you might still see gains but that's not common. If your data size is in the tens or low hundreds of GB range and won't grow significantly, you're likely to be much happier with Azure SQL Database.
As for resource class management, workload monitoring, etc... check out the following links:
https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-develop-concurrency
https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-manage-monitor
https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-best-practices
I have a database with tables that grow every day. I cannot predict which tables are going to grow and which are not as I'm not the one who is putting the data into them.
Is there a way to find tables that need indexes at a particular point in time? Is there a way, in SQL Server, to notify me if a database needs tuning on certain tables?
This is a product we have deployed at different client locations and we cannot go onto their servers every time to check if they have a performance issue. What I was thinking about is something that can notify me if there are performance issues on certain tables, so as the new patches go to the clients we can add these indexes or tuned queries.
After referring to Insertion of data after creating index on empty table or creating unique index after inserting data on oracle? I'm not willing to create indexes while installing databases or when the tables have few rows or are empty.
As per my understanding we must not create indexes on a smaller table as it can affect the write performances.
This is only a real concern if you're bulk loading or otherwise generating a hundred million records each day and write performance is a problem. Indexes do increase write times because they have to be updated when data is written, but unless you're running on a potato or running very high loads it's unlikely to be a problem. You'd know it was a problem before you encountered it.
If we're talking about small tables (less than 100 pages) then it's much more likely that indexes won't be useful because the data set is so small, but you shouldn't be concerned about impacting write performance.
Overall, your application should have indexes that support the queries that you expect should be run in your unit testing and staging. You will need feedback from your customers or clients, but until you really know how people use their data, you're going to have to make a best guess.
The general question of "How do I know what indexes I need when I don't know what queries will be run?" is better suited to DBA Stack Exchange. Briefly, you'll need to use dynamic management views for that. The three missing index dynamic views can be used for this. The example query given isn't horrible:
SELECT mig.*, statement AS table_name,
column_id, column_name, column_usage
FROM sys.dm_db_missing_index_details AS mid
CROSS APPLY sys.dm_db_missing_index_columns (mid.index_handle)
INNER JOIN sys.dm_db_missing_index_groups AS mig
ON mig.index_handle = mid.index_handle
ORDER BY mig.index_group_handle, mig.index_handle, column_id;
You shouldn't just blindly follow what this view says, however. It's a good lead on what to look at, but you have to look at the column order and queries actually being used to tell.
You should also monitor index usage statistics and examine how much and in what way indexes are used compared to how much they have to be updated. Indexes that are updated a million times a day but are used once or twice should be considered for removal.
You will also want to monitor query stats to look for queries that run for a long time. This may be poor development on the part of your client, but can also be a sign of design problems.
This is not even a comprehensive overview of things to look for, however. There's a lot to database maintenance and operations. That's why DBAs make a good living. This is just the tip of the iceberg. Just the tip for indexes, even.
What I'd do if you want to maintain this is consider asking your customers to allow you to send feedback for performance analysis. Set up a broker that monitors the management views and sends compiled and sanitized information back to yourselves. You'll need to be very careful about what you send because you don't want to be sending actual customer data, of course.
Keep in mind that dynamic management views typically reset when the instance does, so the results will not typically represent the entire lifespan of the database.
I'm currently researching a very large table (~100 million rows, 35 columns), it's currently stored in SQL db, but the queries I'm running (and they're various) run very, very slow..
so I get it I should probably move to NoSQL db. question is:
How can I tell which (NoSQL) db is best for me?
How can I move my current SQL table to the new NoSQL scheme?
OR should I stay in SQL and just fine tune it?
A few more details: rows will not be added/removed, this is historical data and all of the analysis will be done on that table. plan to run various queries on it. data is numerical.
I routinely work with a SQL Server 2012 table that has 900 million rows. This table has rows being added to it about every 2 minutes with a total of about 200K per day. I can query this table and get rows back in a couple seconds (using the clustered index / PK). I can also query on one of the other indexes and get results back in seconds or less.
So, it's all a matter of making sure your indexes are set up correctly, AND BEING USED!! Check your queries against the query plan being generated and make sure seeks are being done.
There could be good reasons for moving to NoSQL, or something similar. But moving to NoSQL because you think you can't get good performance in SQL Server, before making sure you've done everything you can do to improve performance first, is not a good reason.
Some food for thought:
100M rows is well within SQL's "sweet spot". You can grow by x10 and still be assured that SQL will be able to support you with fairly trivial effort.
NoSQL is not a silver bullet for solving performance problems at scale. It offers a set of tradeoffs which, with careful planning, can provide better results. But if sounds like you don't fully understand your performance issues in SQL, and without that your chances of making the correct design decisions in a NoSQL environment are slim.
One of the common tradeoffs in NoSQL systems is that they typically provide less flexibilty in querying, in return for greater flexibility in schema management. You mentioned your queries are "various"- if they are truly varied, or more importantly- frequently changing - then moving to a NoSQL system can put you in a world of pain. Especially if you are not familiar with the technology yet.
Bottom line- You aren't doing anything which is clearly "beyond" the capabilities of SQL, and your problems are probably caused more by inefficient implementation than by any inherent platform limitations. Moving to a NoSQL system won't magically solve any of your problems, and will probably introduce new ones.
If you are running a query on columns that are not indexed you will be very slow. You can add more indexes to speed them up. If your DB is static this should work.
One major speed up is the usage of map-reduce queries, where aggregations are carried out by multiple processes or computers. NoSQL databases like MongoDB can be used in such ways. But even MySQL has Cluster capabilities nowadays: http://www.mysql.de/products/cluster/scalability.html. SQL Server can be clustered as well.
So I guess the best first shot would be to optimize your indexes in the table to the query. Each argument column to the query (compare, count ...) etc. should be indexed.
If this is not doing any better you probably count and calculate a lot and you should use map-reduce jobs and a DB which can handle this like MongoDB: http://docs.mongodb.org/manual/aggregation/
I hope this helps
i am having recently came to know that sql server if i delete one column or modify it acquires space at backend so i need to reindex and shrink the database and i have done it and my datbase size reduced to
2.82 to 1.62
so its good like wise so now i am in a confusion
so in my mind many questions regarding this subject occurs pls help me about this one
1. So it is necessary to recreate indexes(refresh ) after particular interval
It is necessary to shrink database after particular time so performance will be up to date?
If above yes then what particular time should i refresh (Shrink) my database?
i am having no idea what should be done for disk spacing problem i am having 77000 records it takes 2.82gb dataspace which is not acceptable i am having two tables of that one only with one table nvarchar(max) so there should be minimum spaces to database can anyone help me on this one Thanks in advance
I am going to simplify things a little for you so you might want to read up about the things I talk about in my answer.
Two concepts you must understand. Allocated space vs free space. A database might be 2GB in size but it is only using 1GB so it has allocated 2GB with 1GB free space. When you shrink a database it removes the free space so free space should be about 0. Dont think smaller file size is faster. As you database grows it has to allocate space again. When you shrink the file and then it grows every so often it cannot allocate space in a contiguous fashion. This will create fragmentation of the files which slows you down even more.
With data files(.mdb) files this is not so bad but with the transaction log shrinking the log can lead to virtual log file fragmentation issues which can slow you down. So in a nutshell there is very little reason to shrink your database on a schedule. Go read about Virtual Log Files in SQL Server there are a lot of articles about it. This is a good article about shrink log files and why it is bad. Use it as a starting point.
Secondly indexes get fragmented over time. This will lead to bad performance of SELECT queries mainly but will also affect other queries. Thus you need to perform some index maintenance on the database. See this answer on how to defragment your indexes.
Update:
Well the time you rebuild indexes is not clear cut. Index rebuilds lock the index during the rebuild. Essentially they are offline for the duration. In your case it would be fast 77 000 rows is nothing for SQL server. So rebuilding the indexes will consume server resources. IF you have enterprise edition you can do online index rebuilding which will NOT lock the indexes but will consume more space.
So what you need to do is find a maintenance window. For example if your system is used from 8:00 till 17:00 you can schedule maintenance rebuilds after hours. Schedule this with SQL server agent. The script in the link can be automated to run.
Your database is not big. I have seen SQL server handle tables of 750GB without taking strain if the IO is split over several disks. The slowest part of any database server is not the CPU or the RAM but the IO pathways to the disks. This is a huge topic though. Back to your point you are storing data in NVARCHAR(MAX) fields. I assume this is large text. So after you shrink the database you see the size at 1,62GB which means that each row in your database is about 1,62/77 000 big or roughly 22Kb big. This seems reasonable. Export the table to a text file and check the size you will be suprised it will probably be larger than 1,62GB.
Feel free to ask more detail if required.
I have a problem with a large database I am working with which resides on a single drive - this Database contains around a dozen tables with the two main ones are around 1GB each which cannot be made smaller. My problem is the disk queue for the database drive is around 96% to 100% even when the website that uses the DB is idle. What optimisation could be done or what is the source of the problem the DB on Disk is 16GB in total and almost all the data is required - transactions data, customer information and stock details.
What are the reasons why the disk queue is always high no matter the website traffic?
What can be done to help improve performance on a database this size?
Any suggestions would be appreciated!
The database is an MS SQL 2000 Database running on Windows Server 2003 and as stated 16GB in size (Data File on Disk size).
Thanks
Well, how much memory do you have on the machine? If you can't store the pages in memory, SQL Server is going to have to go to the disk to get it's information. If your memory is low, you might want to consider upgrading it.
Since the database is so big, you might want to consider adding two separate physical drives and then putting the transaction log on one drive and partitioning some of the other tables onto the other drive (you have to do some analysis to see what the best split between tables is).
In doing this, you are allowing IO accesses to occur in parallel, instead of in serial, which should give you some more performance from your DB.
Before buying more disks and shifting things around, you might also update statistics and check your queries - if you are doing lots of table scans and so forth you will be creating unnecessary work for the hardware.
Your database isn't that big after all - I'd first look at tuning your queries. Have you profiled what sort of queries are hitting the database?
If you disk activity is that high while your site is idle, I would look for other processes that might be running that could be affecting it. For example, are you sure there aren't any scheduled backups running? Especially with a large db, these could be running for a long time.
As Mike W pointed out, there is usually a lot you can do with query optimization with existing hardware. Isolate your slow-running queries and find ways to optimize them first. In one of our applications, we spent literally 2 months doing this and managed to improve the performance of the application, and the hardware utilization, dramatically.