What does exec sp_updatestats do? - sql

What is the use of sp_updatestats? Can I run that in the production environment for performance improvement?

sp_updatestats updates all statistics for all tables in the database, where even a single row has changed. It does it using the default sample, meaning it doesn't scan all rows in the table so it will likely produce less accurate statistics than the alternatives.
If you have a maintenance plan with 'rebuild indexes' included, it will also refresh statistics, but more accurate because it scans all rows. No need to rebuild stats after rebuilding indexes.
Manually updating particular statistics object or a table with update statistics command gives you much better control over the process. For automating it, take a look here.
Auto-update fires only when optimizer decides it has to. There was a change in math for 2012: in <2012, auto update was fired for every 500 + 20% change in table rows; in 2012+ it is SQRT(1000 * Table rows). It means it is more frequent on large tables. Temporary tables behave differently, of course.
To conclude, sp_updatestats could actually do more damage than good, and is the least recommendable option.

Related

Postgres SQL sentence performance

I´ve a Postgres instance running in a 16 cores/32 Gb WIndows Server workstation.
I followed performance improvements tips I saw in places like this: https://www.postgresql.org/docs/9.3/static/performance-tips.html.
When I run an update like:
analyze;
update amazon_v2
set states_id = amazon.states_id,
geom = amazon.geom
from amazon
where amazon_v2.fid = amazon.fid
where fid is the primary key in both tables and both has 68M records, it takes almost a day to run.
Is there any way to improve the performance of SQL sentences like this? Should I write a stored procedure to process it record by record, for example?
You don't show the execution plan but I bet it's probably performing a Full Table Scan on amazon_v2 and using an Index Seek on amazon.
I don't see how to improve performance here, since it's close to optimal already. The only think I can think of is to use table partitioning and parallelizing the execution.
Another totally different strategy, is to update the "modified" rows only. Maybe you can track those to avoid updating all 68 million rows every time.
Your query is executed in a very log transaction. The transaction may be blocked by other writers. Query pg_locks.
Long transactions have negative impact on performance of autovacuum. Does execution time increase other time? If,so check table bloat.
Performance usually increases when big transactions are dived into smaller. Unfortunately, the operation is no longer atomic and there is no golden rule on optimal batch size.
You should also follow advice from https://stackoverflow.com/a/50708451/6702373
Let's sum it up:
Update modified rows only (if only a few rows are modified)
Check locks
Check table bloat
Check hardware utilization (related to other issues)
Split the operation into batches.
Replace updates with delete/truncate & insert/copy (this works if the update changes most rows).
(if nothing else helps) Partition table

Generated de-normalised View table

We have a system that makes use of a database View, which takes data from a few reference tables (lookups) and then does a lot of pivoting and complex work on a hierarchy table of (pretty much fixed and static) locations, returning a view view of data to the application.
This view is getting slow, as new requirements are added.
A solution that may be an option would be to create a normal table, and select from the view, into this table, and let the application use that highly indexed and fast table for it's querying.
Issue is, I guess, if the underlying tables change, the new table will show old results. But the data that drives this table changes very infrequently. And if it does - a business/technical process could be made that means an 'Update the Table' procedure is run to refresh this data. Or even an update/insert trigger on the primary driving table?
Is this practice advised/ill-advised? And are there ways of making it safer?
The ideal solution is to optimise the underlying queries.
In SSMS run the slow query and include the actual execution plan (Ctrl + M), this will give you a graphical representation of how the query is being executed against your database.
Another helpful tool is to turn on IO statistics, this is usually the main bottleneck with queries, put this line at the top of your query window:
SET STATISTICS IO ON;
Check if SQL recommends any missing indexes (displayed in green in the execution plan), as you say the data changes infrequently so it should be safe to add additional indexes if needed.
In the execution plan you can hover your mouse over any element for more information, check the value for estimated rows vs actual rows returned, if this varies greatly update the statistics for the tables, this can help the query optimiser find the best execution plan.
To do this for all tables in a database:
USE [Database_Name]
GO
exec sp_updatestats
Still no luck in optimising the view / query?
Be careful with update triggers as if the schema changes on the view/table (say you add a new column to the source table) the new column will not be inserted into your 'optimised' table unless you update the trigger.
If it is not a business requirement to report on real time data there is not too much harm in having a separate optimized table for reporting (Much like a DataMart), just use a SQL Agent job to refresh it nightly during non-peak hours.
There are a few cons to this approach though:
More storage space / duplicated data
More complex database
Additional workload during the refresh
Decreased cache hits

SQL SERVER - Execution plan

I have a VIEW in both Databases. At one database, takes less then 1 second to run and but in the other database 1 minute or more to go. I check indexes and everything is the same. The diference between the number of rows is below than 10 millions of rows from each other database.
I check de exectuion plan, and what i found is that, the database that takes more time, i have 3 Hash Match(1 aggregate and 2 right outer join) that is responssible for 100% on the query batch. On the other database i don't have this in the execution plan.
Can anyone tell me where can i begin to search the problem?
Thank you, sorry for the bad english.
You can check this link here for a quick explanation on different types of joins.
Basically, with the information you've given us, here are some of the alternatives for what might be wrong:
One DB has indexes the other doesn't.
The size difference between some of the joined tables in one DB over the other, is dramatic enough to change the type of join used.
While your indexes might be the same on both DB table groups, as you said.. it's possible the other DB has outdated / bad statistics or too much index fragmentation, resulting in sub-optimal plans.
EDIT:
Regarding your comment below, it's true that rebuilding indexes is similar to dropping & recreating indexes. And since creating indexes also creates the statistics for those indexes, rebuilding will take care of them as well. Sometimes that's not enough however.
While officially default statistics should be built with about 20% sampling rate of the actual data, in reality the sampling rate can be as low as just a few percents depending on how massive the table is. It's rarely anywhere near 20%. Because of that, many DBA's build statistics manually with FULLSCAN to obtain a 100% sampling rate.
The statistics take equally much storage space either way, so there are really no downsides to this aside from the extra time required in maintenance plans. In my current project, we have several situations where the default sampling rate for the statistics is not enough, and would still produce bad plans. So we routinely update all statistics with FULLSCAN every few weeks to make sure the performance stays top notch.

What impact do statistics have on a table

When I add an index to a table there is an obvious benefit in searching, however there is also a cost involved with insert/update/delete statements as the index needs to be updated.
If i create a new statistic on a table, does it incur similar costs to an index?
Whatever statistics are being used to find the data for a query will be checked to see if they are up to date. If they are not up to date (based on a random sample) then SQL will update them. If they are not up to date then your query will have a performance hit while it waits for the stats to update.
From what I've found statistics can be set to Auto Update Statistics Asynchronously. This will cause it to use the old statistics for the current query, but tells SQL to update the stats in the background for next time. This could make the current query perform badly if a lot of data has changed.
Main source: MSDN

Postgres query optimization

On postgres 9.0, set both index_scan and seq_scan to Off. Why does it improve query performance by 2x?
This may help some queries run faster, but is almost certain to make other queries slower. It's interesting information for diagnostic purposes, but a bad idea for a long-term "solution".
PostgreSQL uses a cost-based optimizer, which looks at the costs of all possible plans based on statistics gathered by scanning your tables (normally by autovacuum) and costing factors. If it's not choosing the fastest plan, it is usually because your costing factors don't accurately model actual costs for your environment, statistics are not up-to-date, or statistics are not fine-grained enough.
After turning index_scan and seq_scan back on:
I have generally found the cpu_tuple_cost default to be too low; I have often seen better plans chosen by setting that to 0.03 instead of the default 0.01; and I've never seen that override cause problems.
If the active portion of your database fits in RAM, try reducing both seq_page_cost and random_page_cost to 0.1.
Be sure to set effective_cache_size to the sum of shared_buffers and whatever your OS is showing as cached.
Never disable autovacuum. You might want to adjust parameters, but do that very carefully, with small incremental changes and subsequent monitoring.
You may need to occasionally run explicit VACUUM ANALYZE or ANALYZE commands, especially for temporary tables or tables which have just had a lot of modifications and are about to be used in queries.
You might want to increase default_statistics_target, from_collapse_limit, join_collapse_limit, or some geqo settings; but it's hard to tell whether those are appropriate without a lot more detail than you've given so far.
You can try out a query with different costing factors set on a single connection. When you confirm a configuration which works well for your whole mix (i.e., it accurately models costs in your environment), you should make the updates in your postgresql.conf file.
If you want more targeted help, please show the structure of the tables, the query itself, and the results of running EXPLAIN ANALYZE for the query. A description of your OS and hardware helps a lot, too, along with your PostgreSQL configuration.
Why ?
The most logical answer is because of the way your database tables are configured.
Without you posting your table schema's I can only hazard a guess that your indices don't have a high cardinality.
that is to say, that if your index contains too much information to be useful then it will be far less efficient, or indeed slower.
Cardinality is a measure of how unique a row in your index is. The lower the cardinality, the slower your query will be.
A perfect example is having a boolean field in your index; perhaps you have a Contacts table in your database and it has a boolean column that records true or false depending on whether the customer would like to be contacted by a third party.
In the mean, if you did 'select * from Contacts where OptIn = true'; you can imagine that you'd return a lot of Contacts; imagine 50% of contacts in our case.
Now if you add this 'Optin' column to an index on that same table; it stands to reason that no matter how fine the other selectors are, you will always return 50% of the table, because of the value of 'OptIn'.
This is a perfect example of low cardinality; it will be slow because any query involving that index will have to select 50% of the rows in the table; to then be able to apply further WHERE filters to reduce the dataset again.
Long story short; If your Indices include bad fields or simply represent every column in the table; then the SQL engine has to resort to testing row-by-agonizing-row.
Anyway, the above is theoretical in your case; but it is a known common reason for why queries suddenly start taking much longer.
Please fill in the gaps regarding your data structure, index definitions and the actual query that is really slow!