After looking in Azure's QueryPerformanceInsight for a sql database I use daily, I have noticed that a query is being executed a lot of times per day (thousands per hour it seems). The query is basically:
select * from TableName t where t.Id = id
I am trying to hunt down what is executing this query so many times in such a short space of time, and have put this together:
select top 1 text.[text], stat.last_execution_time, stat.execution_count, plans.*, stat.*
from sys.dm_exec_cached_plans as plans
join sys.dm_exec_query_stats as stat on plans.plan_handle = stat.plan_handle
cross apply sys.dm_exec_sql_text(plans.plan_handle) as text
order by stat.execution_count desc
This is showing me the query that I am looking for (as the text is the same as shown in Azure Insights) and it also shows how quickly the execution count is rising.
The issue is, I dont know how to find where this query is being executed from! None of the 3 tables I am referencing in the above query have a SessionId column on them so I am unsure where to go from here!
If anyone has any experience about hunting down a query being spammed against a database over and over, please help!
Related
This one is puzzling me. Pulling from the DB2 by itself is fast, and pulling from the table is fast, but I don't know why they don't play nicely together. I don't have access to the indices of the DB2 table or the server there.
This query takes 0.017 seconds:
select
PART_NO,
APRV_DT,
round((CURRENT_DATE - APRV_DT)/365.242199,1) as AGE,
rank() over (partition by PART_NO order by APRV_DT asc) rnk,
FROM DB2_TABLE
where PART_NO in
('529711',
'627862',
'325712',
'979257',
'168570',
'004297')
Obviously I don't want to hard-code all the part numbers because I have almost 200k of them to query.
I left the part numbers in here just to try and get this working. This query where I select the same 6 part numbers takes 1.23 seconds:
select distinct PART_NUMBER from PARTS_REPORT
where PART_NUMBER in
('529711',
'627862',
'325712',
'979257',
'168570',
'004297')
The issue is when I combine these together:
In my mind, this query should take about 3 seconds or something. It takes 492 Seconds.
select
PART_NO,
APRV_DT,
round((CURRENT_DATE - APRV_DT)/365.242199,1) as AGE,
rank() over (partition by PART_NO order by APRV_DT asc) rnk,
FROM DB2_TABLE
where PART_NO in
(
select distinct PART_NUMBER from PARTS_REPORT
where PART_NUMBER in
('529711',
'627862',
'325712',
'979257',
'168570',
'004297')
)
Is there a better way to do this? Do I need to index my PARTS_REPORT table? What's the key here?
edit: to run all 200k-ish part numbers, the same query takes 564 seconds - around the time it takes what I have above to run.
Edit 2: the user below helped me to know what was going on - I have to pull down the whole remote table, and that's slow. I think I understand what's happening now - thanks.
Summarizing my comments as an answer:
In your first query you are providing an explicit IN list to the query, so it executes remotely on the DB2 server in its entirety, returning of data from DB2_TABLE.
When you attempt to retrieve the search criteria from a local table (which you do with where PART_NO in) you force a join of a remote and a local table, for which the entire remote table has to be sent to the local server where the join is performed.
Sending the local table (or subset thereof) to the remote server to perform the join there presumably requires less bandwidth. You could achieve this by declaring a temporary table, loading it with the list of part numbers from the Oracle table, then performing your query against two remote tables, localizing the join there.
You already have some privileges in the remote database, which allow you to query its table(s); try and see if you can run DECLARE GLOBAL TEMPORARY TABLE -- by default it does not require any privileges beyond normal PUBLIC privileges.
I am running on a POC environment where there are only one name node and one data node running. Impala daemon is running on data node. Both of the nodes have 128GB memory each. I had set the mem_limit to 60GB.
I had two big tables in Impala. First table has around 635 million records while second table is around 250000 records. I inner join this 2 tables using a common parameter. The SQL statement is as the following:
select a.*, b.* from table_a a inner join table_b b on a.param=b.param order by a.t_date desc
When i use EXPLAIN, it showed Estimated Per-Host Requirements: Memory=992.03MB VCores=2. When i run this query, it took more than one hour and the result yet to be return. I am wondering why it took so long. Is this related to mem_limit settings? How can i tune such query?
Try tuning as Impala performance
Some ideal
Try big_table join small_table
Partition on param column
If have many
query execute in the same time, you should enable Admission
controll (2) and Dynamic Resource Pools (3)
Try execute summary after execute your query in impala-shell to see what step take long time.
And plz post all result of EXPLAIN statement
P/S: Sorry because im not enough reputation to post more than 2 link
I've used the solution (by AdaTheDev) from the thread below as it relates to the same question:
How to exclude records with certain values in sql select
But when applying the same query to over 40,000 records, the query takes too long to process (>30mins). Is there another way that's efficient in querying for certain records with certain values (the same question as in the above stackoverflow thread). I've attempted to use this below and still no luck:
SELECT StoreId
FROM sc
WHERE StoreId NOT IN (
SELECT StoreId
FROM StoreClients
Where ClientId=5
);
Thanks --
You could use EXISTS:
SELECT StoreId
FROM sc
WHERE NOT EXISTS (
SELECT 1
FROM StoreClients cl
WHERE sc.StoreId = cl.StoreId
AND cl.ClientId = 5
)
Make sure to create an index on StoreId column. Your query could also benefit from an index on ClientId.
40,000 rows isn't too big for SQL server. The other thread is suggesting some good queries too. If I understand correctly, your problem right now is the performance.
To improve the performance, you may need to add a new index to your table, or just update the statistics on your table.
If you are running this query in SQL server management studio for testing and performance tuning, make sure that you are selecting "Include Actual Execution Plan". This will make the query to take longer to execute, but the execution plan result can help you on where the time is spent.
It usually suggests adding some indexes to improve the performance, which is easy to follow.
If adding new indexes are not possible, then you have to spend some time reading on how to understand the execution plan's diagram and finding the slow areas.
I am having a SQL query which is sheduled to run on every week and pulls the data from different database and the query is running for around 2 hrs this is due to the amount of data it is selecting, on the same time this is utilizing more CPU utilization on the source SQL server where database abc resides. The query is given below,
select a.* from abc.art_si a inner join abc.article b
on a.ARTICLEID = b.ARTICLEID where b.TYPE_IND='B'
I would like to know the below,
running of this query will utilize more CPU? If so,
is there any way to optimize the above query?
Your advise will be very helpful for me.
Thank you.
I have a table in MS Access 2010 I'm trying to analyze of people who belong to various groups having completed various jobs. What I would like to do is calculate the standard deviation of the count of the number of jobs each person has completed per group. Meaning, the output I would like is that for each group, I'd have a number that constitutes the standard deviation of how many jobs each person did.
The data is structured like this:
OldGroup, OldPerson, JobID
I know that I need to do a COUNT of the job IDs by Group and Person. I tried creating a subquery to work with, but that didn't work:
SELECT data.OldGroup, STDEV(
SELECT COUNT(data.JobID)
FROM data
WHERE data.Classification = 1
GROUP BY data.OldGroup, data.OldPerson
)
FROM data
GROUP BY data.OldGroup;
This returned an error "At most one record can be returned by this subquery," which I know is wrong, since when I tried to run the subquery as a standalone query it successfully returned more than one record.
Question:
How can I get the STDEV of a COUNT?
Subquestion: If this question can be answered by correcting incorrect syntax in my examples, please do so.
A minor change in strategy that wouldn't work for all cases but did end up working for this one seemed to take care of the problem. Instead of sticking the subquery in the SELECT statement, I put it in FROM, mimicking creating a separate table.
As such, my code looks like:
SELECT OldGroup, STDEV(NumberJobs) AS JobsStDev
FROM (
SELECT OldGroup, OldPerson, COUNT(JobID) AS NumberJobs
FROM data
WHERE data.Classification = 1
GROUP BY OldGroup, OldPerson
) AS TempTable
GROUP BY OldGroup;
That seemed to get the job done.
Try doing a max table query for "SELECT COUNT(data.JobID)...."
Then for the 2nd query, use the new base table.
Sometimes it is just easier to do something in 2 or more queries.