Migrating from Azure SQL Database to Data Warehouse

Migrating from Azure SQL Database to Data Warehouse - azure-sql-database

I currently have an Azure SQL database, where the data in it is all in a star schema (Fact/Dim tables with column store indexes) and is used exlusively for a reporting app. We currently use a Premium database instance with 250 DTUs and it is about 150GB in size, but increasing all the time.
For a similar price I could create an SQL Data Warehouse instance with 100 DWUs. My concern is that as it is only 100 DWUs vs 250 DTUs, that I would actually see a performance reduction.
I know that DWUs and DTUs are not directly comparable, but can anyone tell me if I am likely to see a performance boost/reduction in these circumstances?

For what it's worth, 1 DWU = 7.5 DTU with respect to server capacity as explained here.
When you look at the server instance that you provision a DW instance on:
100 DWU instance consumes 750 DTUs of server capacity. This means you receive 500 DTUs more than the 250 DTUs associated with the Azure SQL Database Premium tier you currently have.
400 DWU instance consumes 3,000 DTUs of server capacity
Take in consideration you have lesser concurrency with Azure SQL Data Warehouse.

Related

CETAS times out for large tables in Synapse Serverless SQL

I'm trying to create a new external table using CETAS (CREATE EXTERNAL TABLE AS SELECT * FROM <table>) statement from an already existing external table in Azure Synapse Serverless SQL Pool. The table I'm selecting from is a very large external table built on around 30 GB of data in parquet format stored in ADLS Gen 2 storage but the query always times out after about 30 minutes. I've tried using premium storage and also tried out most if not all the suggestions made here as well but it didn't help and the query still times out.
The error I get in Synapse Studio is :-
Statement ID: {550AF4B4-0F2F-474C-A502-6D29BAC1C558} | Query hash: 0x2FA8C2EFADC713D | Distributed request ID: {CC78C7FD-ED10-4CEF-ABB6-56A3D4212A5E}. Total size of data scanned is 0 megabytes, total size of data moved is 0 megabytes, total size of data written is 0 megabytes. Query timeout expired.
The core use case is that assuming I only have the external table name, I want to create a copy of the data over which that external table is created in Azure storage itself.
Is there a way to resolve this timeout issue or a better way to solve the problem?

This is a limitation of Serverless.
Query timeout expired
The error Query timeout expired is returned if the query executed more
than 30 minutes on serverless SQL pool. This is a limit of serverless
SQL pool that cannot be changed. Try to optimize your query by
applying best practices, or try to materialize parts of your queries
using CETAS. Check is there a concurrent workload running on the
serverless pool because the other queries might take the resources. In
that case you might split the workload on multiple workspaces.
Self-help for serverless SQL pool - Query Timeout Expired
The core use case is that assuming I only have the external table name, I want to create a copy of the data over which that external table is created in Azure storage itself.
It's simple to do in a Data Factory copy job, a Spark job, or AzCopy.

SQL Server - Memory quota error during migration to in-memory table

We are currently migrating to in-memory tables on SQL Server 2019 Standard Edition. The disk based table is 55GB data + 54Gb of indexes (71M records). RAM is 900 GB. But during data migration (INSERT statement) we get an error message:
Msg 41823, Level 16, State 109, Line 150
Could not perform the operation because the database has reached its quota for in-memory tables. This error may be transient. Please retry the operation.
The in-memory file is “unlimited”, so it looks strange since SQL Server 2019 should not have any size restrictions for in-memory tables.

Why do you think in-memory data size in a single mem-opt table is unlimited on standard edition?
From Memory Limits in SQL Server 2016 SP1 (all of which still applies according to 2019 docs):
Each user database on the instance can have an additional 32GB allocated to memory-optimized tables, over and above the buffer pool limit.
So, you can do what you want, I suppose, but you'll have to spread it across multiple databases. You won't be able to store more than 32GB in a single mem-opt table or even in multiple mem-opt tables in a single database.
Cropped and probably inappropriately-scaled screenshot from the 2019 docs:

How to make duplicate a postgres database on the same RDS instance faster?

thank you guys in advance.
I am having a 60GB Postgres RDS on aws, and there is databaseA inside this RDS instance, I want to make a duplicate of databaseA called databaseB in the same RDS server.
So basically what I tried is to run CREATE DATABASE databaseB WITH TEMPLATE databaseA OWNER postgres; This single query took 6 hours to complete, which is too slow. I see the max IOPS during the process is 120, not even close to the limit of aws general SSD's limit 10,000 IOPS. I have also tried tunning up work_mem, shared_buffers, effective_cache_size in parameter group, There is no improvements at all.
My last option is to just create two separate RDS instance, but It will be much easier if I can do this in one instance. I'd appreciate any suggestions.
(The instance class is db.m4.xlarge)

As mentioned by Matt; you have two options:
Increase your server size which will give you more IOPS.
Switch to provisioned IOPS
As this is a temporary requirement I will go with 1 because u can upgrade to max. available server --> do database copy --> downgrade db server seamlessly and won't take much time. Switching SSD to provisioned IOPS will take lots of time because it needs to convert your data and hence more downtime. And later again when u will switch back from provisioned iops to SSD again it will take time.
Note that Both 1 & 2 are expensive ( if u really dont need them ) if used for long term; so u can't leave it as is.

2 SQL servers but different tempdb IO pattern on 1 spikes up and down 5MB/sec-0.2MB/sec

I have 2 MSSQL servers (lets call then SQL1 and SQL2) running a total of 1866 databases
SQL1 has 993 databases (993203 registered users)
SQL2 have 873 databases (931259 registered users)
Each SQL server has a copy of a InternalMaster database (for some shared table data) and then multiple customers, 1 database per customer (Customer/client not registered user).
At the time of writing this we had just over 10,000 users online using our software.
SQL2 behaves as expected and Database I/O is generally 0.2MB/sec and goes up and down in a normal flow, IO's goes up on certain reports and queries and so on in a random fashion.
However SQL1 has a constant pattern almost like a life support machine.
I don't understand why both servers which have the same infrastructure, work so differently? The spike starts at around 2MB/sec and then increases to a max of around 6MB/sec. Both servers have identical IOPS provisions of data, log and transaction partitions and identical AWS specs. The Data file I/O shows that tempdb is the culprit of this spike.
Any advice would be great as I just can't get my head around how 1 tempdb would act different to another when running the same software and setup on both servers.
Regards
Liam

Liam,
Please see this website that explains how to configure TEMPDB. By looking at the image, you only have one file for the TEMPDB database.
http://www.brentozar.com/sql/tempdb-performance-and-configuration/
Hope this helps

Concurrent Reads from Highly Transactional Table

Currently, I have a highly transactional database with appx 100,000 inserts daily. Do I need to be concerned if I start allowing a large number of concurrent reads from my main transaction table? I am not concerned about concurrency, so much as performance.
At present there are 110+ million transactions in this table, and I am using SQL 2005

In 2002, a dell server with 2 GB of RAM, and 1.3 GHz CPU served 25 concurrent users as a File Server, a Database Server, and ICR server (very CPU intensive). Users and ICR server continuously insert, read and update one data table with 80+ million records where each operation requires 25 to 50 insert or update statements. It worked like a charm for 24/7 for almost a year. If you use decent indexes, and your selects use these indexes, it will work.
As #huadianz proposed, a read-only copy will do even better.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Migrating from Azure SQL Database to Data Warehouse - azure-sql-database

Related

CETAS times out for large tables in Synapse Serverless SQL

SQL Server - Memory quota error during migration to in-memory table

How to make duplicate a postgres database on the same RDS instance faster?

2 SQL servers but different tempdb IO pattern on 1 spikes up and down 5MB/sec-0.2MB/sec

Concurrent Reads from Highly Transactional Table

Categories

Resources