Spark SQL - Performance Diagnostics - apache-spark-sql

I am using Spark SQL but some queries are very slow. I would like to know how I can get some insights about why the queries are slow so that I can try to optimise the system.

Check stages locality level and timing in drivers UI. Usually you can find it on 4040 port on driver

Related

SQL Queries tuning into CRM

I have question regarding MS Dynamics CRM and SQL.
We have CRM 2015 on-prem., we do get lot of warning messages in windows app log that such and such queries exceeded threshold value. Eventually bunch of this warnings lead to an error of failed workflow/plugin.
We ask MS and they recommended to use SQL Server Tuning Advsior and Profiler to optimize those queries, when we ran SQL Server Tuning Advsior and profiler, we got result that this query can be improved by 54% or 65% via applying following changes. Most of them are indexing.
My question is how to create/modify/apply these recommendations in running/live CRM application?
Any guidance will be really helpful.
Thanks.
Creating indexes is a expensive operation, because it has to build the index for the existing records.
If you could do it in the evenings, and you can afford having some downtime, go for that option. If you have a SQL Server cluster / high availability solution, it would be best to do it for each node separately to avoid downtime.

Azure SQL Database vs. MS SQL Server on Dedicated Machine

I'm currently running an instance of MS SQL Server 2014 (12.1.4100.1) on a dedicated machine I rent for $270/month with the following specs:
Intel Xeon E5-1660 processor (six physical 3.3ghz cores +
hyperthreading + turbo->3.9ghz)
64 GB registered DDR3 ECC memory
240GB Intel SSD
45000 GB of bandwidth transfer
I've been toying around with Azure SQL Database for a bit now, and have been entertaining the idea of switching over to their platform. I fired up an Azure SQL Database using their P2 Premium pricing tier on a V12 server (just to test things out), and loaded a copy of my existing database (from the dedicated machine).
I ran several sets of queries side-by-side, one against the database on the dedicated machine, and one against the P2 Azure SQL Database. The results were sort of shocking: my dedicated machine outperformed (in terms of execution time) the Azure db by a huge margin each time. Typically, the dedicated db instance would finish in under 1/2 to 1/3 of the time that it took the Azure db to execute.
Now, I understand the many benefits of the Azure platform. It's managed vs. my non-managed setup on the dedicated machine, they have point-in-time restore better than what I have, the firewall is easily configured, there's geo-replication, etc., etc. But I have a database with hundreds of tables with tens to hundreds of millions of records in each table, and sometimes need to query across multiple joins, etc., so performance in terms of execution time really matters. I just find it shocking that a ~$930/month service performs that poorly next to a $270/month dedicated machine rental. I'm still pretty new to SQL as a whole, and very new to servers/etc., but does this not add up to anyone else? Does anyone perhaps have some insight into something I'm missing here, or are those other, "managed" features of Azure SQL Database supposed to make up the difference in price?
Bottom line is I'm beginning to outgrow even my dedicated machine's capabilities, and I had really been hoping that Azure's SQL Database would be a nice, next stepping stone, but unless I'm missing something, it's not. I'm too small of a business still to go out and spend hundreds of thousands on some other platform.
Anyone have any advice on if I'm missing something, or is the performance I'm seeing in line with what you would expect? Do I have any other options that can produce better performance than the dedicated machine I'm running currently, but don't cost in the tens of thousand/month? Is there something I can do (configuration/setting) for my Azure SQL Database that would boost execution time? Again, any help is appreciated.
EDIT: Let me revise my question to maybe make it a little more clear: is what I'm seeing in terms of sheer execution time performance to be expected, where a dedicated server # $270/month is well outperforming Microsoft's Azure SQL DB P2 tier # $930/month? Ignore the other "perks" like managed vs. unmanaged, ignore intended use like Azure being meant for production, etc. I just need to know if I'm missing something with Azure SQL DB, or if I really am supposed to get MUCH better performance out of a single dedicated machine.
(Disclaimer: I work for Microsoft, though not on Azure or SQL Server).
"Azure SQL" isn't equivalent to "SQL Server" - and I personally wish that we did offer a kind of "hosted SQL Server" instead of Azure SQL.
On the surface the two are the same: they're both relational database systems with the power of T-SQL to query them (well, they both, under-the-hood use the same DBMS).
Azure SQL is different in that the idea is that you have two databases: a development database using a local SQL Server (ideally 2012 or later) and a production database on Azure SQL. You (should) never modify the Azure SQL database directly, and indeed you'll find that SSMS does not offer design tools (Table Designer, View Designer, etc) for Azure SQL. Instead, you design and work with your local SQL Server database and create "DACPAC" files (or special "change" XML files, which can be generated by SSDT) which then modify your Azure DB such that it copies your dev DB, a kind of "design replication" system.
Otherwise, as you noticed, Azure SQL offers built-in resiliency, backups, simplified administration, etc.
As for performance, is it possible you were missing indexes or other optimizations? You also might notice slightly higher latency with Azure SQL compared to a local SQL Server, I've seen ping times (from an Azure VM to an Azure SQL host) around 5-10ms, which means you should design your application to be less-chatty or to parallelise data retrieval operations in order to reduce page load times (assuming this is a web-application you're building).
Perf and availability aside, there are several other important factors to consider:
Total cost: your $270 rental cost is only one of many cost factors. Space, power and hvac are other physical costs. Then there's the cost of administration. Think work you have to do each patch Tuesday and when either Windows or SQL Server ships a service pack or cumulative update. Even if you don't test them before rolling out, it still takes time and effort. If you do test, then there's a second machine and duplicating the product instance and workload for test.
Security: there is a LOT written about how bad and dangerous and risky it is to store any data you care about in the cloud. Personally, I've seen way worse implementations and processes on security with local servers (even in banks and federal agencies) than I've seen with any of the major cloud providers (Microsoft, Amazon, Google). It's a lot of work getting things right then even more work keeping them right. Also, you can see and audit their security SLAs (See Azure's at http://azure.microsoft.com/en-us/support/trust-center/).
Scalability: not just raw scalability but the cost and effort to scale. Azure SQL DB recently released the huge P11 edition which has 7x the compute capacity of the P2 you tested with. Scaling up and down is not instantaneous but really easy and reasonably quick. Best part is (for me anyway), it can be bumped to some higher edition when I run large queries or reindex operations then back down again for "normal" loads. This is hard to do with a regular SQL Server on bare metal - either rent/buy a really big box that sits idle 90% of the time or take downtime to move. Slightly easier if in a VM; you can increase memory online but still need to bounce the instance to increase CPU; your Azure SQL DB stays online during scale up/down operations.
There is an alternative from Microsoft to Azure SQL DB:
“Provision a SQL Server virtual machine in Azure”
https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-provision-sql-server/
A detailed explanation of the differences between the two offerings: “Understanding Azure SQL Database and SQL Server in Azure VMs”
https://azure.microsoft.com/en-us/documentation/articles/data-management-azure-sql-database-and-sql-server-iaas/
One significant difference between your stand alone SQL Server and Azure SQL DB is that with SQL DB you are paying for high levels of availability, which is achieved by running multiple instances on different machines. This would be like renting 4 of your dedicated machines and running them in an AlwaysOn Availability Group, which would change both your cost and performance. However, as you never mentioned availability, I'm guessing this isn't a concern in your scenario. SQL Server in a VM may better match your needs.
SQL DB has built in availability (which can impact performance), point in time restore capability and DR features. You have the option to scale up / down your DB based on your usage to reduce the cost. You can improve your query performance using Global query (shard data). SQl DB manages auto upgrades and patching and greatly improves the manageability story. You may need to pay a little premium for that. Application level caching / evenly distributing the load, downgrading when cold etc. may help improve your database performance and optimize the cost.

how does the caching process work in ASE 15.0.3

I am monitoring a Sybase server (ASE 15.0.3) for it's performance. One of the things it monitors is the cached data. but I want to understand how the caching process really works in ASE 15.0.3. Can one instance of ASE 15.0.3 cache statements running in another instance or is the caching limited to it's own instance. And what are the tables used in the caching process in both the case of self caching and cross caching
NOTE: by performance I mean Full set of performance tuning as supported by ASE 15.0.3
By default ASE uses the 'default data cache' for data cache and 'procedure cache' to cache 'SP query plans'. As from ASE15.x (not in compatibility mode), 'statement cache' can be used for optimization to cache query plans of SQL statement that are re-used a lot.
I use sp_sysmon to collect server-wide performance measures and read the following doc to optimize my data cache:
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc20020.1502/html/basics/X56939.htm
The very useful MDA tables are generating too much overhead in my system so I try to only use temporary to collect performance measure.
Hope this helps.

Map out SQL Server 2008 usage? (from logs)

All-
I'm trying to determine which SQL databases are currently being used the most (as well as what applications are requesting information from them).
Is there a log analyzing tool? Or something built into SQL server that could help me achieve this?
Ideally I'd like to show a map of server usage and understand which applications are actually hitting them.
Thanks!
sys.dm_db_index_usage_stats shows exactly how many time each index/table was read/scanned/updated since the server started up. This is the most important piece of information since everything else (IO, RAM, CPU) can be ultimately traced to these operations. The one information not revealed from here is blocking and contention, for which a good starting point is sys.dm_os_wait_stats. And finally there is sys.dm_exec_query_stats which will drill down to the individual query CPU and execution times.
If you right-click on the server in Management Studio you will see a 'Reports' option. There are a lot of built in reports which might give you what you need (the 'Server Dashboard' report in particular shows which databases are consuming the most CPU and I/O).
Alternatively the Profiler provides a lot of (perhaps too much) valuable data.

Selecting SQL Server 2008 metrics from a database?

First off, I'm not even sure this is possible. One of my co-workers is requesting that I help him retrieve performance metrics from a Microsoft SQL Server 2008 database using a remote connection and a SQL query.
Specifically, we are looking for stuff like database memory usage and CPU usage. Is this stored in a table somewhere that I can easily just SELECT it from?
I've tried googling this but mainly all I come up with are ads for products that do SQL performance monitoring. I realize we could use perfmon in Windows to get this data, but that's not what he's looking for. Also remote WMI gathering of perfmon metrics is out. It has to be a remote SQL query - some product limitation I won't get into in detail. :)
Even a definitive "This is not possible" is a valid answer.
Thanks in advance.
There is DBCC MEMORYSTATUS to get a ton of memory information.
DBCC statements in general are full with useful information about your SQL Server.
SO Answer on how you can "build" your own taskmon for SQL Server.
Another great source for information about server state are Dynamic Management Views.
Knock yourself out.
To get the sort of info you want, you'll need to use SQL Server's performance/system monitor counters. See the MSDN article Monitoring Resource Usage (System Monitor) for details:
If you are running Microsoft Windows server operating system, use
the System Monitor graphical tool to measure the performance of
SQL Server. You can view SQL Server objects, performance counters,
and the behavior of other objects, such as processors, memory, cache,
threads, and processes. Each of these objects has an associated
set of counters that measure device usage, queue lengths, delays,
and other indicators of throughput and internal congestion.
[And yes...you can access peformance counters remotely (assuming you have the requisite permissions]