Azure SQL sessions blocked in status PAGEIOLATCH_SH - azure-sql-database

I've got a Spring application with Hibernate that sporadically stops working because all 30 connections of its connection pool are blocked. While these connections were blocked, I executed a bunch of queries to find the cause.
Each connection executed the same join statement.
The execution plan of that join looks like that
This query
SELECT *
FROM sys.dm_exec_requests
CROSS APPLY sys.dm_exec_sql_text(sql_handle)
returns for each connection something like:
status = suspended
command = SELECT
blocking_session_id = 0
wait_type = last_wait_type = PAGEIOLATCH_SH
wait_time = 3
cpu_time = 12909
total_elapsed_time = 3723943
logical_reads = 7986970
The 3 indexes involved have a size of about 16GB (last one in the screenshot above), 1 GB and 500 MB respectively.
This is happening inside a sql database in an elastic pool with 24 vcores, Gen5, max data size 2418 GB
The resource monitor of that elastic pool looked reasonable (arrow indicates the correct time):
Anything else I could check? Any ideas what could be the reason for this?

Inserting data on an increasing key (like an identity column) that means many parallel threads will be going after the last data page of that table if a good number of connections is trying to ingest data to the table, trying to add rows on that same data page.
If you put that table in memory (in-memory OLTP) you may not have a disk contention problem, but in this scenario that table is not in-memory so the disk contention is manifesting itself as pagelatch_ex wait.
Another option is to make the clustered index a GUID. That will make the insert distribute among all the data pages, but it may cause page splits.

Related

CETAS times out for large tables in Synapse Serverless SQL

I'm trying to create a new external table using CETAS (CREATE EXTERNAL TABLE AS SELECT * FROM <table>) statement from an already existing external table in Azure Synapse Serverless SQL Pool. The table I'm selecting from is a very large external table built on around 30 GB of data in parquet format stored in ADLS Gen 2 storage but the query always times out after about 30 minutes. I've tried using premium storage and also tried out most if not all the suggestions made here as well but it didn't help and the query still times out.
The error I get in Synapse Studio is :-
Statement ID: {550AF4B4-0F2F-474C-A502-6D29BAC1C558} | Query hash: 0x2FA8C2EFADC713D | Distributed request ID: {CC78C7FD-ED10-4CEF-ABB6-56A3D4212A5E}. Total size of data scanned is 0 megabytes, total size of data moved is 0 megabytes, total size of data written is 0 megabytes. Query timeout expired.
The core use case is that assuming I only have the external table name, I want to create a copy of the data over which that external table is created in Azure storage itself.
Is there a way to resolve this timeout issue or a better way to solve the problem?
This is a limitation of Serverless.
Query timeout expired
The error Query timeout expired is returned if the query executed more
than 30 minutes on serverless SQL pool. This is a limit of serverless
SQL pool that cannot be changed. Try to optimize your query by
applying best practices, or try to materialize parts of your queries
using CETAS. Check is there a concurrent workload running on the
serverless pool because the other queries might take the resources. In
that case you might split the workload on multiple workspaces.
Self-help for serverless SQL pool - Query Timeout Expired
The core use case is that assuming I only have the external table name, I want to create a copy of the data over which that external table is created in Azure storage itself.
It's simple to do in a Data Factory copy job, a Spark job, or AzCopy.

Is there a way to examine the contents of a SAP HANADB transaction log

I have a SAP B1 system that's being migrated from Microsoft SQL to HANA DB. Our solution in the staging environment is producing huge transaction logs, tens of gigabytes in an hour, but the system isn't receiving production workloads yet. SAP have indicated that the database is fine, and that it's our software that's at fault, but I'm not clear on how to identify this. As far as I can tell each program is sleeping between poll intervals, and the intervals are not high (one query per minute). We just Traced SQL for an hour, and there were only in the region of 700 updates, but still tens of gigabytes of transaction log.
Does anybody have an idea how to debug the transaction log? - I'd like to see what's being recorded.
Thanks.
The main driver of high transaction log data is not the number of SQL commands executed but the size/number of records affected by those commands.
In addition to DML commands (DELETE/INSERT/UPDATE) also DDL commands like CREATE and ALTER table produce redo log data. For example, re-partitioning a large table will produce a large volume of redo logs.
For HANA there are tools (hdblogdiag) that allow inspecting the log volume structures. However, the usage and output of this (and similar tools) absolutely require extensive knowledge of the internals of how HANA operates redo logs.
For the OPs situation, I recommend checking for the volume of data changes caused by both DML and DDL.
We had the same issue.
There is a bug in SAP HANA SPS11 < 112.06 and SPS12 < 122.02 in the LOB garbage collector for the row store.
You can take a look at the SAP Note 2351467
In short, you can either
upgrade HANA
or convert the rowstore tables containing LOB columns into columnstore with the query ALTER TABLE "<schema_name>"."<table_name>" COLUMN;
You can find the list with this query :
select
distinct lo.schema_name,
lo.table_name
from sys.m_table_lob_files lo
inner join tables ta on lo.table_oid = ta.table_oid
and ta.table_type = 'ROW'
or disable the row store lob garbage collector by editing the indexserver.ini to set "garbage_lob_file_handler_enabled = false" under the [row_engine] section.

Database copy limit per database reached. The database X cannot have more than 10 concurrent database copies (Azure SQL)

In our application, we have a master database 'X'. For each new client, we will create a new database copy of master database 'X'.
I am using the following SQL command which will be executed against Azure SQL server.
CREATE DATABASE [NEW NAME] AS COPY OF [MASTER DB]
We are using a custom queue tier so that we can create more than one client at a time parallelly.
I am facing issues in following scenario.
I am trying to create 70 clients. Once 25 clients got created I am getting below error.
Database copy limit per database reached. The database 'BlankDBClient' cannot have more than 10 concurrent database copies
Can you please share your thoughts on this?
SQL Azure has logic to do various operations online/automatically for you (backups, upgrades, etc). There are IOs required to do each copy, so there are limits in place because the machine does not have infinite iops. (Those limits may change a bit over time as we work to improve the service, get newer hardware, etc).
In terms of what options you have, you could:
Restore N databases from a database backup (which would still have IO limits but they may be higher for you depending on your reservation size)
Consider models to copy in parallel using a single source to hierarchically create what you need (copy 2 from one, then copy 2 from each of the ones you just copied, etc)
Stage out the copies over time based on the limits you get back from the system.
Try a larger reservation size for the source and target during the copy to get more IOPS and lower the time to perform the operations.
In addition to Connor answer, you can consider to a have a dacpac or bacpac of that master database stored on Azure Storage and once you have submitted 25 concurrent database copies you can start restoring the dacpac from Azure Storage.
You can also monitor how many database copies are showing COPYING on the state_desc column of the following queries, after sending the first batch of 25 copies, and when those queries return less than 25 rows, start sending more copies until reaching the 25 limit. Keep doing this until finishing the queue of copies required.
Select
[sys].[databases].[name],
[sys].[databases].[state_desc],
[sys].[dm_database_copies].[start_date],
[sys].[dm_database_copies].[modify_date],
[sys].[dm_database_copies].[percent_complete],
[sys].[dm_database_copies].[error_code],
[sys].[dm_database_copies].[error_desc],
[sys].[dm_database_copies].[error_severity],
[sys].[dm_database_copies].[error_state]
From
[sys].[databases]
Left
Outer
Join
[sys].[dm_database_copies]
On
[sys].[databases].[database_id] = [sys].[dm_database_copies].[database_id]
Where
[sys].[databases].[state_desc] = 'COPYING'
SELECT state_desc, *
FROM sys.databases
WHERE [state_desc] = 'COPYING'

How to make duplicate a postgres database on the same RDS instance faster?

thank you guys in advance.
I am having a 60GB Postgres RDS on aws, and there is databaseA inside this RDS instance, I want to make a duplicate of databaseA called databaseB in the same RDS server.
So basically what I tried is to run CREATE DATABASE databaseB WITH TEMPLATE databaseA OWNER postgres; This single query took 6 hours to complete, which is too slow. I see the max IOPS during the process is 120, not even close to the limit of aws general SSD's limit 10,000 IOPS. I have also tried tunning up work_mem, shared_buffers, effective_cache_size in parameter group, There is no improvements at all.
My last option is to just create two separate RDS instance, but It will be much easier if I can do this in one instance. I'd appreciate any suggestions.
(The instance class is db.m4.xlarge)
As mentioned by Matt; you have two options:
Increase your server size which will give you more IOPS.
Switch to provisioned IOPS
As this is a temporary requirement I will go with 1 because u can upgrade to max. available server --> do database copy --> downgrade db server seamlessly and won't take much time. Switching SSD to provisioned IOPS will take lots of time because it needs to convert your data and hence more downtime. And later again when u will switch back from provisioned iops to SSD again it will take time.
Note that Both 1 & 2 are expensive ( if u really dont need them ) if used for long term; so u can't leave it as is.

2 SQL servers but different tempdb IO pattern on 1 spikes up and down 5MB/sec-0.2MB/sec

I have 2 MSSQL servers (lets call then SQL1 and SQL2) running a total of 1866 databases
SQL1 has 993 databases (993203 registered users)
SQL2 have 873 databases (931259 registered users)
Each SQL server has a copy of a InternalMaster database (for some shared table data) and then multiple customers, 1 database per customer (Customer/client not registered user).
At the time of writing this we had just over 10,000 users online using our software.
SQL2 behaves as expected and Database I/O is generally 0.2MB/sec and goes up and down in a normal flow, IO's goes up on certain reports and queries and so on in a random fashion.
However SQL1 has a constant pattern almost like a life support machine.
I don't understand why both servers which have the same infrastructure, work so differently? The spike starts at around 2MB/sec and then increases to a max of around 6MB/sec. Both servers have identical IOPS provisions of data, log and transaction partitions and identical AWS specs. The Data file I/O shows that tempdb is the culprit of this spike.
Any advice would be great as I just can't get my head around how 1 tempdb would act different to another when running the same software and setup on both servers.
Regards
Liam
Liam,
Please see this website that explains how to configure TEMPDB. By looking at the image, you only have one file for the TEMPDB database.
http://www.brentozar.com/sql/tempdb-performance-and-configuration/
Hope this helps