Get CPU usage of operations on a table in a schema in HANA

Get CPU usage of operations on a table in a schema in HANA - hana

I want to track how much CPU is being utilized during the operations on tables in a given schema?
Is there any view or table that could help determine that? Or maybe even a view or query that snapshots the value would help.
I am aware of the view M_HOST_RESOURCE_UTILIZATION but this tracks the overall utilization of CPU and not just for a schema or a table.

There is no such view on HANA and probably on any database. Because the overhead would be too much to track down CPU utilization on table levels.
One possible way is to make the below link:
CPU utilization <-> number of threads in state 'Running' <-> SQL statement in that thread <-> Table accessed by the SQL
Then you could find a very rough approximation of CPU utilization of tables.

Related

Memory Buffer pool taken by a Table

My server has 250 GB RAM and it's a physical server. Max memory configured to 230 GB when ran a DMV sys.dm_os_buffer_descriptors with joining other DMV, I found a table taking almost 50 GB Buffer pool space. My question is, Is this an Issue? If so what's the best way to tackle it? My PLE is very high, no slowness report. Thanks.

The data most often and recently used will remain buffer pool cache so it is expected that 50GB of table data will be cached when the table and data are used often. Since your PLE is acceptable, there may be no concerns for now.
You may still want to take a look at query plans that use the table in question. It could be that more data than needed is brought into the buffer pool cache due to large scans when fewer pages are actually needed by queries. Query and index tuning may be in order in that case. Tuning will also reduce CPU and other resource utilization, providing headroom for growth and other queries in the workload.

Why is collect statistics in databases called a resource consuming activity?

Whenever we reach out to our Teradata DBA's to collect stats on specific tables, we get a feedback that it is a resource consuming activity and we will do it when system is relatively free or on the weekends when there is no load on system.
The tables for which stats collection is required are getting queried on intra-day basis. The explain plan shows "High confidence" if we collect stats on few columns.
So I just want to understand why stats collection is called as a resource consuming activity ? If we do not collect stats on tables which are getting loaded on intra-day basis, aren't we burdening the system by executing SQL's for which explain plans are saying "Collect stats" ?
Thanks!

Yes absence of stats may result in less optimal access paths and lesser performance than might otherwise be possible/achievable.
But yes collecting stats is a bit more intensive than looking whether a key value is present in a table. So yes on loaded systems, it is not the wisest idea to add a stats collection to the load mix.
And at any rate, if the concerned tables are "loaded on an intraday basis", this means they are highly volatile and collecting stats for them might turn out to be not that useful after all, as any new load might render existing stats completely obsolete and/or off. If you can provide reasonably accurate stats on a manual basis, do that.
EDIT
Oh yes, to answer the actual question you asked, "Why is collect statistics in databases called a resource consuming activity?" : because it consumes resources, and seriously above average compared to "normal" database transactions.

Statistics collection is a maintenance activity that requires balance in a production Teradata environment. Teradata continues to make strides in improving the efficiency of statistics maintenance with each release of the database. If I recall correctly, one of the more recent improvements is to identify unused statistics or objects and bypass refreshing thee statistics during statistics collection. But it is a resource intensive operation on large tables with multiple sets of statistics present.
The frequency in which you collect statistics will vary based on the size of the table, how the table is loaded, and the number of statistics that exist on that table. Tables that are “flushed and filled” require more frequent statistics collection than tables where data is appended or updated in place. The former should have statistics collected after loading. The latter will vary based on the volume of data that changes vs. the time since the last collection of statistics. Stale statistics can mislead the optimizer or cause the optimizer to abandon them in favor of random sampling.
Furthermore, the larger a table grows in relation to the size of the system along with the known demographics of the table structure the ability to rely on sample statistics in place of full statistics comes into play. Being able to use the correct sample size reduces the cost of collecting the statistics.
It is not uncommon for statistics maintenance activities to be scheduled off hours or over the weekend. For large platforms, the collection of statistics across the system can be measured in hours. As a DBA I would be reluctant to refresh the statistics on a large production table in the middle of the day unless there was a query that was causing catastrophic problems (i.e. hot AMPing). Even then the remedy would be to prevent that query from running until statistics could be collected off hours.
If you have SLA’s defined in your environment and believe statistics collection would improve your ability to meet your SLAs, then a discussion with the DBA’s to come to a better understanding is necessary. Based on what you described, the DBA response is not surprising because they are trying to ensure the users receive the resources during the day.
Finally, if you have tables that are being loaded intra-day, the collection of SUMMARY statistics has low overhead and should be part of your ETL routine. Previously, collection of PARTITION statistics was also advisable irrespective of whether the table was actually partitioned, but I don’t recall if that has fallen out of favor in the most recent releases (16.xx) of Teradata. PARTITION statistics were also fairly low overhead.
Hope this helps.

Benchmarking Cube Processing in SSAS

I am currently working on a project to improve cube processing time. The cube currently consists of 50 facts and 160 dimensions and it takes about 4 hours to process the cube. What would be the best way to benchmark cube processing performance before embarking on troubleshooting bottlenecks. The largest dimension consists of about nine million records while the largest fact table consists of about 250 million records. How would you go about finding bottlenecks and what parameters would influence the processing time the most. Any help is highly appreciated.

Having done a lot of SSAS processing optimization I would start with some monitoring. First setup some performance counters to monitor available memory, disk, CPU and network on the SSAS server and the database server. Some good perfmon counters (and recommendations about baselining processing performance) are in section 4.1.1 in the SSAS performance guide.
Second I would start a profiler trace connected to SSAS with the default events. Then when done processing Save As... Trace Table in profiler. Then look for the longest duration events in the SQL table you save it to. Then you know where to spend your time optimizing.
Feel free to write back with your specific longest duration events if you need help. Please also specify exactly how you are processing (like ProcessFull on the database or something else).

SQL HW to performance ration

I am seeking a way to find bottlenecks in SQL server and it seems that more than 32GB ram and more than 32 spindels on 8 cores are not enough. Are there any metrics, best practices or HW comparations (i.e. transactions per sec)? Our daily closure takes hours and I want it in minutes or realtime if possible. I was not able to merge more than 12k rows/sec. For now, I had to split the traffic to more than one server, but is it a proper solution for ~50GB database?
Merge is enclosed in SP and keeped as simple as it can be - deduplicate input, insert new rows, update existing rows. I found that the more rows we put into single merge the more rows per sec we get. Application server runs in more threads, and uses all the memory and processor on its dedicated server.

Follow a methodology like Waits and Queues to identify the bottlenecks. That's exactly what is designed for. Once you identified the bottleneck, you can also judge whether is a hardware provisioning and calibration issue (and if so, which hardware is the bottleneck), or if is something else.

The basic idea is to avoid having to do random access to a disk, both reading and writing. Without doing any analysis, a 50 GB database needs at least 50GB of ram. Then you have to make sure indexes are on a separate spindle from the data and the transaction logs, you write as late as possible, and critical tables are split over multiple spindles. Are you doing all that?

Database Disk Queue too high, what can be done?

I have a problem with a large database I am working with which resides on a single drive - this Database contains around a dozen tables with the two main ones are around 1GB each which cannot be made smaller. My problem is the disk queue for the database drive is around 96% to 100% even when the website that uses the DB is idle. What optimisation could be done or what is the source of the problem the DB on Disk is 16GB in total and almost all the data is required - transactions data, customer information and stock details.
What are the reasons why the disk queue is always high no matter the website traffic?
What can be done to help improve performance on a database this size?
Any suggestions would be appreciated!
The database is an MS SQL 2000 Database running on Windows Server 2003 and as stated 16GB in size (Data File on Disk size).
Thanks

Well, how much memory do you have on the machine? If you can't store the pages in memory, SQL Server is going to have to go to the disk to get it's information. If your memory is low, you might want to consider upgrading it.
Since the database is so big, you might want to consider adding two separate physical drives and then putting the transaction log on one drive and partitioning some of the other tables onto the other drive (you have to do some analysis to see what the best split between tables is).
In doing this, you are allowing IO accesses to occur in parallel, instead of in serial, which should give you some more performance from your DB.

Before buying more disks and shifting things around, you might also update statistics and check your queries - if you are doing lots of table scans and so forth you will be creating unnecessary work for the hardware.
Your database isn't that big after all - I'd first look at tuning your queries. Have you profiled what sort of queries are hitting the database?

If you disk activity is that high while your site is idle, I would look for other processes that might be running that could be affecting it. For example, are you sure there aren't any scheduled backups running? Especially with a large db, these could be running for a long time.
As Mike W pointed out, there is usually a lot you can do with query optimization with existing hardware. Isolate your slow-running queries and find ways to optimize them first. In one of our applications, we spent literally 2 months doing this and managed to improve the performance of the application, and the hardware utilization, dramatically.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas