SQL Server 2008: I/O Wait Time per Database File

SQL Server 2008: I/O Wait Time per Database File - sql

I am running SQL Server 2008 Enterprise Edition and want to monitor the following performance metrics i.e. via dynamic management views (from within SQL):
Average/Maximum Read/Write I/O Waits in ms per database file
for sliding time window.
That is: 4 numbers per database file: avg read wait, max read wait, avg write wait, max write wait. All in ms, and all for some sane (or even better configurable) sliding time window.
How can I do that?
PS: I have the VIEW SERVER STATE permission and can read sys.dm_os_performance_counters, sys.database_files, sys.dm_io_virtual_file_stats etc etc
PS2: At least 1 tool (Quest Spotlight 7 for SQL Server) is able to provide Max I/O Wait in ms per database file. So there has to be some way ..

Below is the query that SSMS's Activie Monitor uses. They label the io_stall field as total wait time. You could add the fs.io_stall_read_ms and fs.io_stall_write_ms fields to get the read/write specific numbers.
SELECT
d.name AS [Database],
f.physical_name AS [File],
(fs.num_of_bytes_read / 1024.0 / 1024.0) [Total MB Read],
(fs.num_of_bytes_written / 1024.0 / 1024.0) AS [Total MB Written],
(fs.num_of_reads + fs.num_of_writes) AS [Total I/O Count],
fs.io_stall AS [Total I/O Wait Time (ms)],
fs.size_on_disk_bytes / 1024 / 1024 AS [Size (MB)],
fs.io_stall_read_ms
FROM sys.dm_io_virtual_file_stats(default, default) AS fs
INNER JOIN sys.master_files f ON fs.database_id = f.database_id AND fs.file_id = f.file_id
INNER JOIN sys.databases d ON d.database_id = fs.database_id;
This query only gives you the totals. You'd have to run it at some interval and record the results in a temp table with a time stamp. You could then query this table to get your min/max/avg as needed. The sliding time window would just be a function of how much data you keep in that table and what time period you query.

The problem you are going to have is that SQL doesn't necessarily track the level of detail you are looking to get per file. You are probably going to have to use Performance monitor as well. You will have to use a combination approach looking at both performance monitor for details on I/O on the disk over the course of time. As well as the aforementioned SQL monitoring techniques to see get a more complete complete picture. I hope that helps.

You can use a few scripts to pull the metrics. I have enebled the Data Collection/Data Collection server built in to SQL Server 8. It collects the metrics from multiple instances and stores them in a mssql server you designate as the collector. The reports provided for each instance are adequate for most purposes. I am sure there are 3rd party tools that go beyond the capabilities of the data collector as it just reports on performance/disk usage/and query statistics without performance hints.
This gives you a data file growth projection and growth event summary for both logs and data files, however, I do not know if it will give you the metrics you are looking for per file or filegroup:)
NOTE: If you use the data collection warehouse you should consider rebuilding indexes periodically as the size grows. It collects approx. 20 MB/day of data in my senario.

See the following for collecting per file wait statistics http://msdn.microsoft.com/en-us/library/ms187309(v=sql.105).aspx

Related

BigQuery. Long execution time on small datasets

I created a new Google cloud project and set up BigQuery data base. I tried different queries, they all are executing too long. Currently we don't have a lot of data, so high performance was expected.
Below are some examples of queries and their execution time.
Query #1 (Job Id bquxjob_11022e81_172cd2d59ba):
select date(installtime) regtime
,count(distinct userclientid) users
,sum(fm.advcost) advspent
from DWH.DimUser du
join DWH.FactMarketingSpent fm on fm.date = date(du.installtime)
group by 1
The query failed in 1 hour + with error "Query exceeded resource limits. 14521.457814668494 CPU seconds were used, and this query must use less than 12800.0 CPU seconds."
Query execution plan: https://prnt.sc/t30bkz
Query #2 (Job Id bquxjob_41f963ae_172cd41083f):
select fd.date
,sum(fd.revenue) adrevenue
,sum(fm.advcost) advspent
from DWH.FactAdRevenue fd
join DWH.FactMarketingSpent fm on fm.date = fd.date
group by 1
Execution time ook 59.3 sec, 7.7 MB processed. What is too slow.
Query Execution plan: https://prnt.sc/t309t4
Query #3 (Job Id bquxjob_3b19482d_172cd31f629)
select date(installtime) regtime
,count(distinct userclientid) users
from DWH.DimUser du
group by 1
Execution time 5.0 sec elapsed, 42.3 MB processed. What is not terrible but must be faster for such small volumes of data.
Tables used :
DimUser - Table size 870.71 MB, Number of rows 2,771,379
FactAdRevenue - Table size 6.98 MB, Number of rows 53,816
FaceMarketingSpent - Table size 68.57 MB, Number of rows 453,600
The question is what am I doing wrong so that query execution time is so big? If everything is ok, I would be glad to hear any advice on how to reduce execution time for such simple queries. If anyone from google reads my question, I would appreciate if jobids are checked.
Thank you!
P.s. Previously I had experience using BigQuery for other projects and the performance and execution time were incredibly good for tables of 50+ TB size.

Posting same reply i've given in the gcp slack workspace:
Both your first two queries looks like you have one particular worker who is overloaded. Can see this because in the compute section, the max time is very different from the avg time. This could be for a number of reasons, but i can see that you are joining a table of 700k+ rows (looking at the 2nd input) to a table of ~50k (looking at the first input). This is not good practice, you should switch it so the larger table is the left most table. see https://cloud.google.com/bigquery/docs/best-practices-performance-compute?hl=en_US#optimize_your_join_patterns
You may also have a heavily skew in your join keys (e.g. 90% of rows are on 1/1/2020, or NULL). check this.
For the third query, that time is expected, try a approx count instead to speed it up. Also note BQ starts to get better if you perform the same query over and over, so this will get quicker.

Teradata Current CPU utilization (Not User level and no History data)

I want to run heavy extraction basically for migration of data from Teradata to some cloud warehouse and would want to check current CPU utilization (in percentage) of overall Teradata CPU and accordingly increase the extraction processes on it.
I know we have this type of information available in "dbc.resusagespma" but it looks like history data and not current, which we can see on Viewpoint.
Can we get such a run time information with the help of SQL in Teradata?

This info is returned by one of the PMPC-API funtions, syslib.MonitorPhysicalSummary, of course, you need Execute Function rights:
SELECT * FROM TABLE (MonitorPhysicalSummary()) AS t

BigQuery Count Appears to be Processing Data

I noticed that running a SELECT count(*) FROM myTable on my larger BQ tables yields long running times, upwards of 30/40 seconds despite the validator claiming the query processes 0 bytes. This doesn't seem quite right when 500 GB queries run faster. Additionally, total row counts are listed under details -> Table Info. Am I doing something wrong? Is there a way to get total row counts instantly?

When you run a count BigQuery still needs to allocate resources (such as: slot units, shards etc). You might be reaching some limits which cause a delay. For example, the slots default per project is 2,000 units.
BigQuery execution plan provides very detail information about the process which can help you better understand the source of the delay.
One way to overcome this is to use an approximate method described in this link
This Slide by Google might also help you
For more details see this video about how to understand the execution plan

How to understand statistics of trace file in Oracle. Such as CPU, elapsed time, query...etc

I am learning query optimization in Oracle and I know that trace file will create statistic about the query execution and EXPLAIN Plan of the query.
At the bottom of the trace file, it is EXPLAIN PLAN of the query. My first question is , does the part "time = 136437 us" show the time duration for the steps of query execution? what does "us" mean ? Is it unit of time?
In addition, can anyone explain what statistics such as count, cpu, elapsed , disk and query mean? I google and read Oracle doc about them already but I still can not understand it. Can anyone clarify the meaning of those stats more clearly?
Thanks in advance. I am new and sorry for my English.

The smallest unit of data access in Oracle Database is a block. Not a row.
Each block can store many rows.
The database can access a block in current or consistent mode.
Current = as the block exists "right now".
Consistent = as the blocked existed at the time your query started.
The query and current columns report how many times the database accessed a block in consistent (query) and current mode.
When accessing a block it may already be in the buffer cache (memory). If so, no disk access is needed. If not, it has to do a physical read (pr). The disk column is a count of the total physical reads.
The stats for each line in the plan are the figures for that operation. Plus the sum of all its child operations.
In simple terms, the database processes the plan by accessing the first child first. Then passes the rows up to the parent. Then all the other child ops of that parent in order. Child operations are indented from their parent in the display.
So the database processed your query like so:
Read 2,000 rows from CUSTOMER. This required 749 consistent block gets and 363 disk reads (cr and pr values on this row). This took 10,100 microseconds.
Read 112,458 rows from BOOKING. This did 8,203 consistent reads and zero disk reads. This took 337,595 microseconds
Joined these two tables together using a hash join. The CR, PR, PW (physical writes) and time values are the sum of the operations below this. Plus whatever work this operation did. So the hash join:
did 8,952 - ( 749 + 8,203 ) = zero consistent reads
did 363 - ( 363 + 0 ) = zero physical reads
took 1,363,447 - ( 10,100 + 337,595 ) = 1,015,752 microseconds to execute
Notice that the CR & PR totals for the hash join match the query and disk totals in the fetch line?
The count column reports the number of times that operation happened. A fetch is a call to the database to get rows. So the client called the database 7,499 times. Each time it received ceil( 112,458 / 7,499 ) = 15 rows.
CPU is the total time in seconds the server's processors were executing that step. Elapsed is the total wall clock time. This is the CPU time + any extra work. Such as disk reads, network time, etc.

out of memory sql execution

I have the following script:
SELECT
DEPT.F03 AS F03, DEPT.F238 AS F238, SDP.F04 AS F04, SDP.F1022 AS F1022,
CAT.F17 AS F17, CAT.F1023 AS F1023, CAT.F1946 AS F1946
FROM
DEPT_TAB DEPT
LEFT OUTER JOIN
SDP_TAB SDP ON SDP.F03 = DEPT.F03,
CAT_TAB CAT
ORDER BY
DEPT.F03
The tables are huge, when I execute the script in SQL Server directly it takes around 4 min to execute, but when I run it in the third party program (SMS LOC based on Delphi) it gives me the error
<msg> out of memory</msg> <sql> the code </sql>
Is there anyway I can lighten the script to be executed? or did anyone had the same problem and solved it somehow?

I remember having had to resort to the ROBUST PLAN query hint once on a query where the query-optimizer kind of lost track and tried to work it out in a way that the hardware couldn't handle.
=> http://technet.microsoft.com/en-us/library/ms181714.aspx
But I'm not sure I understand why it would work for one 'technology' and not another.
Then again, the error message might not be from SQL but rather from the 3rd-party program that gathers the output and does so in a 'less than ideal' way.

Consider adding paging to the user edit screen and the underlying data call. The point being you dont need to see all the rows at one time, but they are available to the user upon request.
This will alleviate much of your performance problem.

I had a project where I had to add over 7 million individual lines of T-SQL code via batch (couldn't figure out how to programatically leverage the new SEQUENCE command). The problem was that there was limited amount of memory available on my VM (I was allocated the max amount of memory for this VM). Because of the large amount lines of T-SQL code I had to first test how many lines it could take before the server crashed. For whatever reason, SQL (2012) doesn't release the memory it uses for large batch jobs such as mine (we're talking around 12 GB of memory) so I had to reboot the server every million or so lines. This is what you may have to do if resources are limited for your project.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas