How to troubleshoot suspended queries in Azure Synapse? - sql

Currently, I encounter an issue of suspended queries in Azure Synapse when executing from ADF (Store procedures call).
Also, I followed the suggestion in the link below for troubleshooting the issue:
Delete due to sensitive informations
The troubleshoot queries returned as below:
I checked if the transaction lock is the issue as I killed a few suspending or running queries which they ran for more than 15 hours. I also checked for the rest of the queries running but there is nothing would cause the transaction lock. I tried to run the store procedure manually from Azure Data Studio which is blocked as mentioned above and It took 40 seconds to complete.
While the suspending query from ADF, it took nearly an hour to finish.
Any suggestion to troubleshoot this issue is much appreciated.
Thanks

There a number of factors you must always consider when tuning queries in Azure Synapse Analytics, dedicated SQL pools:
DWU - what DWU is your pool at? Lower DWUs mean lower concurrent users and lower performance and should not be used for any kind of performance tuning. Crank it up temporarily to rule this out as a problem, bearing in mind changing this disconnects any active queries. Also bear in mind, not all queries respond to higher DWU.
Resource class - what resource class is associated with the user executing these queries? Remember the default is smallrc, and the admin user always has smallrc. Understand static and dynamic resource classes. DMV sys.dm_pdw_exec_requests will give you useful information on this. Trial with your workload to find the sweetspot between performance and concurrency v resource class. Encourage your dev team to use labels in their queries: OPTION ( LABEL = 'some informative label' )
Table geometry - this is the distribution (ROUND_ROBIN|HASH|REPLICATE) of your table and the indexing choice (CLUSTERED COLUMNSTORE|CLUSTERED INDEX|HEAP). Clustered columnstore and round robin are the defaults but they are not always appropriate. Consider what is appropriate for your tables.
If you work through those and still have an issue you can start to look at statistics and workload classification for starters, but gather information on the points above should give you a good idea.
If you are just doing single value INSERTs, then don't. Dedicated SQL pools are terrible with these. Convert these to load from a file in a single INSERT / COPY INTO.

Related

Singlestore (MemSQL)

I have a Singlestore (previously MemSQL) cloud database set up.
My software is running in the background, constantly writing to a table.
When I try to query this table, it takes 10+ seconds. When the software is shut off, the query takes milliseconds.
What would be the reason for this? And is there anything that can be done to mitigate against this?
From a high level, cluster resources are much more utilized while the background software constantly writes to the table. The same resources that handle the constant writes are concurrently trying to serve the query, so it makes sense its faster when there is no writing.
A 'knob to turn' WRT database ingest performance is partition count - you can try creating a test DB w/ more partitions that the current DB (say 2x more). Then try querying from the test DB, both while the background software is running and while it is not - compare this to the DB w/ fewer partitions.
For general guidance on troubleshooting query performance, see this section of the docs: https://docs.singlestore.com/managed-service/en/query-data/query-procedures/troubleshooting-poorly-performing-queries.html
If you're an active customer, you can file a support ticket for the issue for some additional analysis of the backend workings

Query Terminating in Redshift

We are migrating our database from SQL Server 2012 to Amazon Redshift.
The front end of our application is developed in MicroStrategy (MSTR) which fires the queries on Redshift.
Although the application is working fine in Production (on SQL Server 2012), we have run into a strange issue in our PoC Environment on Redshift.
When we kicked off a dashboard in MSTR, the query from the dashboard hits Redshift and it completes successfully without any issues.
But when we stress test the application by running all the dashboards simultaneously, then that particular dashboard's query terminates in Redshift. The database does not throw any error message which is why we cannot troubleshoot why the query is terminating.
Can anyone please suggest how we should go about solving this problem.
Thank you
The problem might be that you have some timeout on the queue that you are sending the query using WLM configuration.
Redshift is designed differently from other DB, to be optimized for Analytical queries. For that reason it doesn't cache queries results, as you would do with OLTP DB. The other difference is that you have a predefined concurrently level (also part of WLM - http://docs.aws.amazon.com/redshift/latest/mgmt/workload-mgmt-config.html). Each concurrency slot will have its allocated resources to complete big queries quickly, but it is limiting the number of concurrent queries that can run. The default configuration is 5, and you can increase it up to 50. The recommendation is to have it increased to not more than 15-20, as with 50, it means that each query is getting only 2% of the cluster resource instead of 20% (with 5) or 5% (with 20).
The combination of these two differences is: if you are connecting many dashboards, each one sends its queries to Redshift, competes over the resources (without caching each query will run again and again), and might timeout or just be too slow for an interactive dashboard.
Please make sure that you are using the Redshift optimized drivers for MicroStrategy, which are sending queries to Redshift under the above assumptions.
You can also consider putting some RDS between your dashboards and Redshift, with the aggregation data that you need for your dashboards, and that can use in-memory caching and higher concurrency on that summary data. You can see an interesting pattern that you can implement with pg-bouncer see here, that can help you send some queries (the analytical ones) to Redshift, and some (the aggregated dashboard ones) to a PostgreSQL one.

SQL Server running slow

I have SQL job that runs every night which does various inserts/updates/deletes. The job contains 40 steps which mainly execute stored procedures.
It's been running fine up until a week ago when suddenly the run time went up from 2.5 hours to over 5 hours, sometimes even 8,9,10!
Could one you please give me any pointers?
First of all let me recommend you a valuable resource on Simple-Talk site. Is a detailed methodology of how to troubleshoot performance issues on SQL Server.
Does the insert you say was carried out was a huge bulk insert that could affect performance? Maybe if it was a huge load the query execution plans could be different and you need to re-tune your table structure, indexes, etc.
If the run time suddenlychanged and no changes where done in the queries or your database structure then I would ask myself several questions:
first, does the process is still taking so long or it was only one time it ran so slow? maybe now is running smoothly and the issue only arised once. Nevertheless, try to find what triggered that bad performance, it can happend again and take down your server
is the server a dedicated sql server? if not, check if some new tasks unrelated to the SQL engine had been configured, maybe a new tasks is doing some heavy I/O jobs and therefore your CRUD operations take longer
if it is a dedicated server, then check that no new job has been added and can take down your existing jobs. Check this SO link for details on jobs settled up from the SQL Agent
maybe low memory due to another process on same server?
And there is lot more to check, but before going deeper I would check that no external (non sql server related) was the reason of the delay on the process execution.

What is "Excessive resource usage" in SQL Azure?

I searched online for awhile about what is "Excessive resource usage" on SQL Azure, still cannot get an idea.
Some articles suggest query takes too long, too much memory etc will cause "Excessive resource usage". But If I use simple query, simple data structure, what will happen?
For example: I get a 1G SQL Azure as session state. Since session is a very small string, and save/delete all the time, I don't think it will grow to 1G for millions of session simultaneously. You can calculate, for 1 million session, 20 char each, only take 20M space, consider 20 minutes expire etc. Cannot even close to 1G. But the queries, should be lots and lots. Each query will be very simple and fast by index.
I wanna know, if this use will be consider as "Excessive resource usage"? Is there any hard number to limit you on the usage?
Btw, as example above, if all happen in same datacenter, so all cost is 1G database which is $10 a month, right?
Unfortunately the answer is 'it depends'. I think that probably the best reference (with guidance) on the SQL Azure Query Throttle is here: TechNet Article on SQL Azure Perormance This will povide details about the metrics that are monitored and the mechanism of the throttle.
The reason that I say it depends is that the throttle is non-deterministic for any given user. This is because the throttle will be activated based on the total load on the node (physical SQL Server in Azure DC). While the subscribers who will get throttled are the subscribers delivering the greatest load the level at which the throttle kicks in will depend on the total load on the node. SO if you are on a quiet node (where other tenant DBs are relatively inactive) then you will be able to put through a bunch more throughput than if you are on a busy node.
It is very appealing to use 1GB SQL Azure DBs for session state storage; you've identified the cost benefits. You are taking a risk though. One way to mitigate this risk is to partition across at least two SQL Azure 1GB DBs and adjust the load yourself based on whether one of the DBs starts hitting the throttle.
Another option if you want determinism for throughput is to use the WIndows Azure Cache to back your sesion state store. The Cache has hard pre-defined limits for query throughput so you can plan for it more easily Azure Caching FAQ including Limits. The Cache approach is probably a bit more expensive but with a lower risk of problems.

Is it possible to get sub-1-second latency with transactional replication?

Our database architecture consists of two Sql Server 2005 servers each with an instance of the same database structure: one for all reads, and one for all writes. We use transactional replication to keep the read database up-to-date.
The two servers are very high-spec indeed (the write server has 32GB of RAM), and are connected via a fibre network.
When deciding upon this architecture we were led to believe that the latency for data to be replicated to the read server would be in the order of a few milliseconds (depending on load, obviously). In practice we are seeing latency of around 2-5 seconds in even the simplest of cases, which is unsatisfactory. By a simplest case, I mean updating a single value in a single row in a single table on the write db and seeing how long it takes to observe the new value in the read database.
What factors should we be looking at to achieve latency below 1 second? Is this even achievable?
Alternatively, is there a different mode of replication we should consider? What is the best practice for the locations of the data and log files?
Edit
Thanks to all for the advice and insight - I believe that the latency periods we are experiencing are normal; we were mis-led by our db hosting company as to what latency times to expect!
We're using the technique described near the bottom of this MSDN article (under the heading "scaling databases"), and we'd failed to deal properly with this warning:
The consequence of creating such specialized databases is latency: a write is now going to take time to be distributed to the reader databases. But if you can deal with the latency, the scaling potential is huge.
We're now looking at implementing a change to our caching mechanism that enforces reads from the write database when an item of data is considered to be "volatile".
No. It's highly unlikely you could achieve sub-1s latency times with SQL Server transactional replication even with fast hardware.
If you can get 1 - 5 seconds latency then you are doing well.
From here:
Using transactional replication, it is
possible for a Subscriber to be a few
seconds behind the Publisher. With a
latency of only a few seconds, the
Subscriber can easily be used as a
reporting server, offloading expensive
user queries and reporting from the
Publisher to the Subscriber.
In the following scenario (using the
Customer table shown later in this
section) the Subscriber was only four
seconds behind the Publisher. Even
more impressive, 60 percent of the
time it had a latency of two seconds
or less. The time is measured from
when the record was inserted or
updated at the Publisher until it was
actually written to the subscribing
database.
I would say it's definately possible.
I would look at:
Your network
Run ping commands between the two servers and see if there are any issues
If the servers are next to each other you should have < 1 ms.
Bottlenecks on the server
This could be network traffic (volume)
Like network cards not being configured for 1GB/sec
Anti-virus or other things
Do some analysis on some queries and see if you can identify indexes or locking which might be a problem
See if any of the selects on the read database might be blocking the writes.
Add with (nolock), and see if this makes a difference on one or two queries you're analyzing.
Essentially you have a complicated system which you have a problem with, you need to determine which component is the problem and fix it.
Transactional replication is probably best if the reports / selects you need to run need to be up to date. If they don't you could look at log shipping, although that would add some down time with each import.
For data/log files, make sure they're on seperate drives so the performance is maximized.
Something to remember about transaction replication is that a single update now requires several operations to happen for that change to occur.
First you update the source table.
Next the log readers sees the change and writes the change to the distribution database.
Next the distribution agent sees the new entry in the distribution database and reads that change, then runs the correct stored procedure on the subscriber to update the row.
If you monitor the statement run times on the two servers you'll probably see that they are running in just a few milliseconds. However it is the lag time while waiting for the log reader and distribution agent to see that they need to do something which is going to kill you.
If you truly need sub second processing time then you will want to look into writing your own processing engine to handle data moving from one server to another. I would recommend using SQL Service Broker to handle this as this way everything is native to SQL Server and no third party code has to be written.