How to optimise slow SQL query - sql

I need a help to optimise this query. In stored procedure this part is executed for 1 hour (all procedure need 2 to execute). Procedure works for a large amount of data. Query works with two temporary tables. Both use indexes:
create unique clustered index #cx_tDuguje on #tDuguje (Partija, Referenca, Konto, Valuta, DatumValute)
create nonclustered index #cx_tDuguje_1 on #tDuguje (Partija, Valuta, Referenca, Konto, sIznos)
create unique clustered index #cx_tPotrazuje on #tPotrazuje (Partija, Referenca, Konto, Valuta, DatumValute)
create nonclustered index #cx_tPotrazuje_1 on #tPotrazuje (Partija, Valuta, Referenca, Konto, pIznos)
And this is a query:
select D.Partija,
D.Referenca,
D.Konto,
D.Valuta,
D.DatumValute DatumZad,
NULLIF(MAX(COALESCE(P.DatumValute,#NextDay)), ,#NextDay) DatumUpl,
MAX(D.DospObaveze) DospObaveze,
MAX(D.LimitMatZn) LimitMatZn
into #dwkKasnjenja_WNT
from #tDuguje D
left join #tPotrazuje P on D.Partija = P.Partija
AND D.Valuta = p.Valuta
AND D.Referenca = p.Referenca
AND D.Konto = P.Konto
and P.pIznos < D.sIznos and D.sIznos <= P.Iznos
WHERE 1=1
AND D.DatumValute IS NOT NULL
GROUP BY D.Partija, D.Referenca, D.Konto, D.Valuta, D.DatumValute
I have and Execution plan, but i am not enabled to post it here.

Just an idea: If you are permitted to do so, try to change the business logic first.
Maybe you constrain the result set only to include data from, say, a point in time back that is meaningful.
How far back in time do your account data reach? Do you really need to include all data all the way back from good old 1999?
Maybe you can say
D.DatumValute >= "Jan 1 2010"
or similar, in your WHERE clause
And this might create a much smaller temporary result set that is used in your complicated JOIN clause which will then run faster.
If you can't do this, maybe do a
"select top 1000 ... order by datum desc" query, which might run faster, and then if the user really needs to , perfomr the slow running query in a second step.

Difficult to say without haging an execution plan or some hints about number of rows in each table.
Replace this index
create nonclustered index #cx_tPotrazuje_1 on #tPotrazuje (Partija, Valuta, Referenca, Konto, pIznos)
by
create nonclustered index #cx_tPotrazuje_1 on #tPotrazuje (Partija, Valuta, Referenca, Konto, sIznos, pIznos)

The creation of indexes is a very expensive process and it could be slow sometimes, according also to the workload of the instance and to the columns involved, for which an index is created.
Furthermore, it's very difficult to say what you need to optimise, without an execution plan and without know something about the data types of the columns involved in the indexes creation.
For example, the data types of the columns Partija, Referenca, Konto, Valuta, DatumValute are not so clear.
You should tell us the data types of the columns involved in the creation of your indexes.

Related

What Columns Should I Index to Improve Performance in SQL

In my query I have a temp table of keys that will be joined to multiple tables later on.
I want to create an index on my temp table to improve performance, cause it takes a couple of minutes for my query to run.
SELECT DISTINCT
k.Id, k.Name, a.Address, a.City, a.State, a.Zip, p.Phone, p.Fax, ...
FROM
#tempKeys k
INNER JOIN
dbo.Address a ON a.AddrId = k.AddrId
INNER JOIN
dbo.Phone p ON p.PhoneId = a.PhoneId
...
My question is should I create an index for each column that is being joined to a table separately
CREATE NONCLUSTERED INDEX ... (Addr.Id ASC)
CREATE NONCLUSTERED INDEX ... (PhoneId ASC)
or can I create one index that includes all columns being joined
CREATE NONCLUSTERED INDEX ... (Addr.Id ASC, PhoneId ASC)
Also, are there other ways I can improve performance on this scenario?
As #DaleK says this is a complex topic. In general though, an index is only usable when all the leading values are used. Your suggestion of a composite index will likely not work. The indexed value of PhoneId cannot be used independently from AddrId. (The index would be ok for AddrId on its own)
The best approach is to have a test database with representative data & volumes then check the query plan & suggestions. Don't forget every index you add has a side effect on the insert.
Another factor is that without a WHERE clause or if there are larger data sets (I think over 5-10% of the table), the optimiser will decide it's often faster to not use indexes anyway.
And I'd rethink using temp tables anyway, let alone indexed ones. They're rarely necessary. A single, large query usually runs faster (and has better data integrity depending on your isolation model) than one split into chunks.

Fastest Way To Get Count From A Table With Conditions?

I am using sql server 2017 and EF Core 2.2. One of my tables right now has 5 million records in it.
I want to group all these records by "CategoryId" and then have a count for each one.
I also need to filter out with a where clause.
However even if I write the query in Sql it still takes around a minute to get me these numbers.
This is way too slow and I need something faster.
select CategoryId, count(*) from Items where Deleted = 'False'
group by CategoryId
I am guessing that EF core probably won't have a solution that will be fast enough so I am open to using ado.net if needed. I just need something that is fast.
Consider creating an indexed view to materialize the aggregation:
CREATE VIEW dbo.ItemCategory
WITH SCHEMABINDING
AS
SELECT CategoryId, COUNT_BIG(*) AS CountBig
FROM dbo.Items
WHERE Deleted = 'False'
GROUP BY CategoryId;
GO
CREATE UNIQUE CLUSTERED INDEX cdx_ItemCategory
ON dbo.ItemCategory (CategoryId);
GO
Using this view for the aggregated result will improve performance significantly:
SELECT CategoryId, CountBig
FROM dbo.ItemCategory;
Depending on your SQL Server edition, you may need to specify the NOEXPAND hint for the view index to be used:
SELECT CategoryId, CountBig
FROM dbo.ItemCategory WITH (NOEXPAND);
You better add indexes on "deleted" and categoryid.
Or put all deleted items on a separate table
You should have a covering index for your query to make it go fast, other than this there is no shortcut to get performance out of it as your query will need to read every page from the table to count the category ID.
I have a table with 5 million rows almost 4.7 million rows are set to Delete = False, without the covering index, my query takes about 12 seconds and execution plan looks like this.
Once I create the following covering index on my table the query is executed in less than a second and the execution plan looks exactly the same but it is doing a seek on the nonclustered index rather than doing a scan on the clustered index:
Index Definition:
CREATE NONCLUSTERED INDEX [Test_Index]
ON [dbo].[Test] ([IsDeleted])
INCLUDE ([CategoryId])
With this covering Index SQL Server will only need to look into the index and return the results rather than looking into your whole table.
If you really want to speed up this query then there is another very specific way to speed up this query by creating a filtered index specifically for your query;
Index definition would be:
CREATE NONCLUSTERED INDEX [Test_Index2]
ON [dbo].[Test] ([CategoryId])
WHERE IsDeleted = 'False'
With this filtered index my query was pretty instant, I didnt set IO time on my query but I would see a few milliseconds. The execution plan slightly changed with this index.

How should I create this index?

I have queries that look like:
select blah, foo
from tableA
where makedate = #somedate
select bar, baz
from tableA
where vendorid = #someid
select foobar, onetwo
from tableA
where vendorid = #someid and makedate between #date1 and #date2
Should I create just one index:
create nonclustered index searches_index on tableA(vendorid, makedate)
Should I create 3 indexes:
create nonclustered index searches_index on tableA(vendorid,
makedate)
create nonclustered index searches_index on tableA(vendorid)
create nonclustered index searches_index on tableA(makedate)
Also are these two different indexes? In other words, does column order matter?
create nonclustered index searches_index on tableA(vendorid, makedate)
create nonclustered index searches_index on tableA(makedate, vendorid)
I've been reading up on indexes but not sure on the best way to make them?
Neither of your suggestions is optimal.
You should create two indexes:
create nonclustered index searches_index on tableA(vendorid, makedate)
create nonclustered index searches_index on tableA(makedate)
The reasons is that the first index on (vendorid, makedate) will be used for both the second and third of your sample queries; an index on (vendorid) only would be redundant.
[Edit] To answer your additional question:
Yes, column order does matter in index creation. An index on (vendorid, makedate) can be used to optimize queries of the form WHERE vendorid = ? AND makedate = ? or WHERE vendorid = ? but cannot help with the query WHERE makedate = ?. In order to get any significant index optimization on the last query you would need an index with makedate at the head of the index. (Note that in my example queries "=" means any optimizable condition).
There exist some edge cases in which an otherwise unhelpful index (like (vendorid, makedate) in a query against makedate only) can provide some nominal help in returning data as #Bram points out in the comments. For instance, if you return only the columns makedate and vendorid in that query then the SQL engine can treat the index as a mini-table and sequentially scan that to find the matching rows, never having to look at the full copy of the table. This is called a covering index.
If it were me, and I knew that the table was always going to be queried in one of those three ways you listed, I would create three indexes as you suggested. Those are pretty light-weight indexes to have, so I wouldn't be concerned (generally speaking) about the other costs that multiple indexes will incur.
Just my opinion, I'm sure there are others contrary.
Also side note: when creating 3 indexes, they will all need to have a unique name

Hint for SQL Query Joined tables with million records

I have below query that is taking on an average more than 5 seconds to fetch the data in a transaction that is triggered in-numerous times via application. I am looking for a hint that can possibly help me reduce the time taken for this query everytime its been fired. My conditions are that I cannot add any indexes or change any settings of application for this query. Hence oracle hints or changing the structure of the query is the only choice I have. Please find below my query.
SELECT SUM(c.cash_flow_amount) FROM CM_CONTRACT_DETAIL a ,CM_CONTRACT b,CM_CONTRACT_CASHFLOW c
WHERE a.country_code = Ip_country_code
AND a.company_code = ip_company_code
AND a.dealer_bp_id = ip_bp_id
AND a.contract_start_date >= ip_start_date
AND a.contract_start_date <= ip_end_date
AND a.version_number = b.current_version
AND a.status_code IN ('00','10')
AND a.country_code = b.country_code
AND a.company_code = b.company_code
AND a.contract_number = b.contract_number
AND a.country_code = c.country_code
AND a.company_code = c.company_code
AND a.contract_number = c.contract_number
AND a.version_number = c.version_number
AND c.cash_flow_type_code IN ('07','13');
The things to know about the tables are that they are all transactional tables and the data of this table keeps changing everyday. They have records in 1 lacs to 10 lacs in numbers.
This is the explain plan currently on the query:
Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop
SELECT STATEMENT Hint=RULE
SORT AGGREGATE
TABLE ACCESS BY INDEX ROWID CM_CONTRACT_CASHFLOW
NESTED LOOPS
NESTED LOOPS
TABLE ACCESS BY INDEX ROWID CM_CONTRACT_DETAIL
INDEX RANGE SCAN XIF760CT_CONTRACT_DETAIL
TABLE ACCESS BY INDEX ROWID CM_CONTRACT
INDEX UNIQUE SCAN XPKCM_CONTRACT
INDEX RANGE SCAN XPKCM_CONTRACT_CASHFLOW
Indexes on CM_CONTRACT_DETAIL:
XPKCM_CONTRACT_DETAIL is a composite unique index on country_code, company_code, contract_number and version_number
XIF760CT_CONTRACT_DETAIL is a non unique index on dealer_bp_id
Indexes on CM_CONTRACT:
XPKCM_CONTRACT is a composite unique index on country_code, company_code, contract_number
Indexes on CM_CONTRACT_CASHFLOW:
XPKCM_CONTRACT_CASHFLOW is a composite unique index on country_code, company_code, contract_number and version_number,supply_sequence_number, cash_flow_type_code,payment_date.
Could you please help better this query? Please let me know if anything else about the tables is required on this. Stats are not gathered on this tables either.
Your query plan says HINT=RULE. Why is that? Is this the standard setting in your dbms? Why not make use of the optimizer? You can use /*+CHOOSE*/ for that. This may be all that's needed. (Why are there no Stats on the tables, though?)
EDIT: The above was nonsense. By not gathering any statistics you prevent the optimizer from doing its work. It will always fall back to the good old rules, because it has no basis to calculate costs on and find a better plan. It is strange to see that you voluntarily keep the dbms from getting your queries fast. You can use hints in your queries of course, but be careful always to check and alter them when table data changes significantly. Better gather statistics and have the optimizer doing this work. As to useful hints:
My feeling says: With that many criteria on CM_CONTRACT_DETAIL this should be the driving table. You can force that with /*+LEADING(a)*/. Maybe even use a full table scan on that table /*+FULL(a)*/, which you can still speed up with parallel execution: /*+PARALLEL(a,4)*/.
Good luck :-)

What is a Covered Index?

I've just heard the term covered index in some database discussion - what does it mean?
A covering index is an index that contains all of, and possibly more, the columns you need for your query.
For instance, this:
SELECT *
FROM tablename
WHERE criteria
will typically use indexes to speed up the resolution of which rows to retrieve using criteria, but then it will go to the full table to retrieve the rows.
However, if the index contained the columns column1, column2 and column3, then this sql:
SELECT column1, column2
FROM tablename
WHERE criteria
and, provided that particular index could be used to speed up the resolution of which rows to retrieve, the index already contains the values of the columns you're interested in, so it won't have to go to the table to retrieve the rows, but can produce the results directly from the index.
This can also be used if you see that a typical query uses 1-2 columns to resolve which rows, and then typically adds another 1-2 columns, it could be beneficial to append those extra columns (if they're the same all over) to the index, so that the query processor can get everything from the index itself.
Here's an article: Index Covering Boosts SQL Server Query Performance on the subject.
Covering index is just an ordinary index. It's called "covering" if it can satisfy query without necessity to analyze data.
example:
CREATE TABLE MyTable
(
ID INT IDENTITY PRIMARY KEY,
Foo INT
)
CREATE NONCLUSTERED INDEX index1 ON MyTable(ID, Foo)
SELECT ID, Foo FROM MyTable -- All requested data are covered by index
This is one of the fastest methods to retrieve data from SQL server.
Covering indexes are indexes which "cover" all columns needed from a specific table, removing the need to access the physical table at all for a given query/ operation.
Since the index contains the desired columns (or a superset of them), table access can be replaced with an index lookup or scan -- which is generally much faster.
Columns to cover:
parameterized or static conditions; columns restricted by a parameterized or constant condition.
join columns; columns dynamically used for joining
selected columns; to answer selected values.
While covering indexes can often provide good benefit for retrieval, they do add somewhat to insert/ update overhead; due to the need to write extra or larger index rows on every update.
Covering indexes for Joined Queries
Covering indexes are probably most valuable as a performance technique for joined queries. This is because joined queries are more costly & more likely then single-table retrievals to suffer high cost performance problems.
in a joined query, covering indexes should be considered per-table.
each 'covering index' removes a physical table access from the plan & replaces it with index-only access.
investigate the plan costs & experiment with which tables are most worthwhile to replace by a covering index.
by this means, the multiplicative cost of large join plans can be significantly reduced.
For example:
select oi.title, c.name, c.address
from porderitem poi
join porder po on po.id = poi.fk_order
join customer c on c.id = po.fk_customer
where po.orderdate > ? and po.status = 'SHIPPING';
create index porder_custitem on porder (orderdate, id, status, fk_customer);
See:
http://literatejava.com/sql/covering-indexes-query-optimization/
Lets say you have a simple table with the below columns, you have only indexed Id here:
Id (Int), Telephone_Number (Int), Name (VARCHAR), Address (VARCHAR)
Imagine you have to run the below query and check whether its using index, and whether performing efficiently without I/O calls or not. Remember, you have only created an index on Id.
SELECT Id FROM mytable WHERE Telephone_Number = '55442233';
When you check for performance on this query you will be dissappointed, since Telephone_Number is not indexed this needs to fetch rows from table using I/O calls. So, this is not a covering indexed since there is some column in query which is not indexed, which leads to frequent I/O calls.
To make it a covered index you need to create a composite index on (Id, Telephone_Number).
For more details, please refer to this blog:
https://www.percona.com/blog/2006/11/23/covering-index-and-prefix-indexes/