Want to optimize a Query which is having Where clause - sql

I am having a SQL query which is sheduled to run on every week and pulls the data from different database and the query is running for around 2 hrs this is due to the amount of data it is selecting, on the same time this is utilizing more CPU utilization on the source SQL server where database abc resides. The query is given below,
select a.* from abc.art_si a inner join abc.article b
on a.ARTICLEID = b.ARTICLEID where b.TYPE_IND='B'
I would like to know the below,
running of this query will utilize more CPU? If so,
is there any way to optimize the above query?
Your advise will be very helpful for me.
Thank you.

Related

SQL Server - Trying to find what is spamming this table

After looking in Azure's QueryPerformanceInsight for a sql database I use daily, I have noticed that a query is being executed a lot of times per day (thousands per hour it seems). The query is basically:
select * from TableName t where t.Id = id
I am trying to hunt down what is executing this query so many times in such a short space of time, and have put this together:
select top 1 text.[text], stat.last_execution_time, stat.execution_count, plans.*, stat.*
from sys.dm_exec_cached_plans as plans
join sys.dm_exec_query_stats as stat on plans.plan_handle = stat.plan_handle
cross apply sys.dm_exec_sql_text(plans.plan_handle) as text
order by stat.execution_count desc
This is showing me the query that I am looking for (as the text is the same as shown in Azure Insights) and it also shows how quickly the execution count is rising.
The issue is, I dont know how to find where this query is being executed from! None of the 3 tables I am referencing in the above query have a SessionId column on them so I am unsure where to go from here!
If anyone has any experience about hunting down a query being spammed against a database over and over, please help!

Impala Query Performance

I am running on a POC environment where there are only one name node and one data node running. Impala daemon is running on data node. Both of the nodes have 128GB memory each. I had set the mem_limit to 60GB.
I had two big tables in Impala. First table has around 635 million records while second table is around 250000 records. I inner join this 2 tables using a common parameter. The SQL statement is as the following:
select a.*, b.* from table_a a inner join table_b b on a.param=b.param order by a.t_date desc
When i use EXPLAIN, it showed Estimated Per-Host Requirements: Memory=992.03MB VCores=2. When i run this query, it took more than one hour and the result yet to be return. I am wondering why it took so long. Is this related to mem_limit settings? How can i tune such query?
Try tuning as Impala performance
Some ideal
Try big_table join small_table
Partition on param column
If have many
query execute in the same time, you should enable Admission
controll (2) and Dynamic Resource Pools (3)
Try execute summary after execute your query in impala-shell to see what step take long time.
And plz post all result of EXPLAIN statement
P/S: Sorry because im not enough reputation to post more than 2 link

How to speed up this simple query

With SourceTable having > 15MM records and Bad_Phrase having > 3K records, the following query takes almost 10 hours to run on SQL Server 2005 SP4.
Update [SourceTable]
Set Bad_Count = (Select count(*)
from Bad_Phrase
where [SourceTable].Name like '%'+Bad_Phrase.PHRASE+'%')
In English, this query is counting the number of times that any phrases listed in Bad_Phrase are a substring of the column [Name] in the SourceTable and then placing that result in the column Bad_Count.
I would like some suggestions on how to have this query run considerably faster.
For a lack of a better idea, here is one:
I don't know if SQL Server natively supports parallelizing an UPDATE statement, but you can try to do it yourself manually by partitioning the work that needs to be done.
For instance, just as an example, if you can run the following 2 update statements in parallel manually or by writing a small app, I'd be curious to see if you can bring down your total processing time.
Update [SourceTable]
Set Bad_Count=(
Select count(*)
from Bad_Phrase
where [SourceTable].Name like '%'+Bad_Phrase.PHRASE+'%'
)
where Name < 'm'
Update [SourceTable]
Set Bad_Count=(
Select count(*)
from Bad_Phrase
where [SourceTable].Name like '%'+Bad_Phrase.PHRASE+'%'
)
where Name >= 'm'
So the 1st update statement takes care of updating all the rows whose names start with the letters a-l, and the 2nd query takes care of o-z.
It's just an idea, and you can try splitting this into smaller chunks and more parallel update statements, depending on the capacity of your SQL Server machine.
Sounds like your query is scanning the whole table. Does your tables have proper indexes on them. Putting an index on columns that appear in a where clause is a good place to start. You can also try and get the cost of the query in the Sql server management studio (display estimated execution cost) or if your willing to wait (display actual execution cost) are both buttons in the query window. The cost will provide insights as to what is taking forever and possibly steer you to wright faster queries.
You are updating the table using sub query with the same table, every row update will scan the whole table and that may cause too much execution time. I think is better if you will insert first all data in the #temp table and then use the #temp table in your update statement. Or you can JOIN the Source table and Temp table as well.

Subquery v/s inner join in sql server

I have following queries
First one using inner join
SELECT item_ID,item_Code,item_Name
FROM [Pharmacy].[tblitemHdr] I
INNER JOIN EMR.tblFavourites F ON I.item_ID=F.itemID
WHERE F.doctorID = #doctorId AND F.favType = 'I'
second one using sub query like
SELECT item_ID,item_Code,item_Name from [Pharmacy].[tblitemHdr]
WHERE item_ID IN
(SELECT itemID FROM EMR.tblFavourites
WHERE doctorID = #doctorId AND favType = 'I'
)
In this item table [Pharmacy].[tblitemHdr] Contains 15 columns and 2000 records. And [Pharmacy].[tblitemHdr] contains 5 columns and around 100 records. in this scenario which query gives me better performance?
Usually joins will work faster than inner queries, but in reality it will depend on the execution plan generated by SQL Server. No matter how you write your query, SQL Server will always transform it on an execution plan. If it is "smart" enough to generate the same plan from both queries, you will get the same result.
Here and here some links to help.
In Sql Server Management Studio you can enable "Client Statistics" and also Include Actual Execution Plan. This will give you the ability to know precisely the execution time and load of each request.
Also between each request clean the cache to avoid cache side effect on performance
USE <YOURDATABASENAME>;
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
I think it's always best to see with our own eyes than relying on theory !
Sub-query Vs Join
Table one 20 rows,2 cols
Table two 20 rows,2 cols
sub-query 20*20
join 20*2
logical, rectify
Detailed
The scan count indicates multiplication effect as the system will have to go through again and again to fetch data, for your performance measure, just look at the time
join is faster than subquery.
subquery makes for busy disk access, think of hard disk's read-write needle(head?) that goes back and forth when it access: User, SearchExpression, PageSize, DrilldownPageSize, User, SearchExpression, PageSize, DrilldownPageSize, User... and so on.
join works by concentrating the operation on the result of the first two tables, any subsequent joins would concentrate joining on the in-memory(or cached to disk) result of the first joined tables, and so on. less read-write needle movement, thus faster
Source: Here
First query is better than second query.. because first query we are joining both table.
and also check the explain plan for both queries...

SQL Server - joining 4 fast queries gives me one slow query

I have 4 views in my MS Sql Server Database which are all quite fast (less than 2 seconds) and return all less than 50 rows.
BUT when I create a query where I join those 4 views (left outer joins) I get a query which takes almost one minute to finish.
I think the query optimizer is doing a bad job here, is there any way to speed this up. I am tempted to copy each of the 4 views into a table and join them together but this seems like too much of a workaround to me.
(Sidenote: I can't set any indexes on any tables because the views come from a different database and I am not allowed to change anything there, so this is not an option)
EDIT: I am sorry, but I don't think posting the sql queries will help. They are quite complex and use around 50 different tables. I cannot post an execution plan either because I don't have enought access rights to generate an execution plan on some of the databases.
I guess my best solution right now is to generate temporary tables to store the results of each query.
If you can't touch indexes, to speed up, you can put results of you 4 queries in 4 temp tables and then join them.
You can do this in a stored procedure.
You can have derived table of views while joining.
EXAMPLE: Instead of having this query
SELECT V1.* FROM dbo.View1 AS V1 INNER JOIN dbo.View2 as V2
ON V1.Column1=V2.Column1;
you can have the below query
SELECT V1.* FROM (SELECT * FROM dbo.View1) AS V1 INNER JOIN (SELECT * FROM dbo.View2) AS V2
ON V1.Column1=V2.Column1;
I hope this can impove the performance.
If you have many columns, only include the columns you need. Particularly, if you have many math operations on the columns, the database has to convert all of the numbers when it returns the results.
One more point is that it is sometimes better to do 3 queries than make a huge join and do 1 query.
Without specifics, however, it is difficult to give the right advice beyond generalities.