DB Design: Looking for Performance improvement when a BIT Column from every table used in every SQL Queries

DB Design: Looking for Performance improvement when a BIT Column from every table used in every SQL Queries - sql

I recently got added to a new ASP .NET Project(A web application) .There were recent performance issues with the application, and I am in a team with their current task to Optimize some slow running stored procedures.
The database designed is highly normalized. In all the tables we have a BIT column as [Status_ID]. In every Stored procedures, For every tsql query, this column is involved in WHERE condition for all tables.
Example:
Select A.Col1,
C.Info
From dbo.table1 A
Join dbo.table2 B On A.id = B.id
Left Join dbo.table21 C On C.map = B.Map
Where A.[Status_ID] = 1
And B.[Status_ID] = 1
And C.[Status_ID] = 1
And A.link > 50
In the above sql, 3 tables are involved, [Status_ID] column from all 3 tables are involved in the WHERE condition. This is just an example. Like this [Status_ID] is involved in almost all the queries.
When I see the execution plan of most of the SPs, there are lot of Key lookup (Clustered) task involved and most of them are looking for [Status_ID] in the respective table.
In the Application, I found that, it is not possible to avoid these column checking from queries. So
Will it be a good idea to
Alter all [Status_ID] columns to NOT NULL, and then adding them to PRIMARY KEY of that table.Columns 12,13.. will be (12,1) and (13,1)
Adding [Status_ID] column to all the NON Clustered indexes in the INCLUDE PART for that table.
Please share you suggestions over the above two points as well as any other.
Thanks for reading.

If you add the Status_ID to the PK you change the definition of the PK
If you add Status_ID to the PK then you could have duplicate ID
And changing the Status_ID would fragment the index
Don't do that
The PK should be what should make the row unique
Add a separate nonclustered index for the Status_ID
And if it is not null then change it to not null
This will only cut the workload in 1/2
Another option is to add [Status_ID] to every other non clustered.
But if it is first it only cuts the workload in 1/2.
And if is second it is only effective if the other component of the index is in the query
Try Status_ID as a separate index
I suspect the query optimizer will be smart enough to evaluate it last since it will be the least specific index
If you don't have an index on link then do so
And try changing the query
Some times this helps the query optimizer
Select A.Col1, C.Info
From dbo.table1 A
Join dbo.table2 B
On A.id = B.id
AND A.[Status_ID] = 1
And A.link > 50
And B.[Status_ID] = 1
Left Join dbo.table21 C
On C.map = B.Map
And C.[Status_ID] = 1
Check the fragmentation of the indexes
Check the type of join
If it is using a loop join then try join hints
This query should not be performing poorly
If might be lock contention
Try with (nolock)
That might not be an acceptable long term solution but it would tell you is locks are the problem

Related

Does it make sense to index a table with just one column?

I wonder if it makes sense to index a table, which contains just a single column? The table will be populated with 100's or 1000's of records and will be used to JOIN to another (larger table) in order to filter its records.
Thank you!

Yes and no. An explicit index probably does not make sense. However, defining the single column as a primary key is often done (assuming it is never NULL and unique).
This is actually a common practice. It is not uncommon for me to create exclusion tables, with logic such as:
from . . .
where not exists (select 1 from exclusion_table et where et.id = ?.id)
A primary key index can speed up such a query.
In your case, it might not make a difference if the larger table has an index on the id used for the join. However, you can give the optimizer of option of choosing which index to use.

My vote is that it probably doesn't really make sense in your scenario. You're saying this table with a single column will be joined to another table to filter records in the other table, so why not just delete this table, index the other column in the other table, and filter that?
Essentially, why are you writing:
SELECT * FROM manycols M INNER JOIN singlecol s ON m.id = s.id WHERE s.id = 123
When that is this:
SELECT * FROM manycols m WHERE m.id = 123
Suppose the argument is that manycols has a million rows, and singlecol has a thousand. You want the thousand matching rows, it's manycols that would need to be indexed then for the benefit.
Suppose the argument is you want all rows except those in singlecol; you could index singlecol but the optimiser might choose to just load the entire table into a hash anyway, so again, indexing it wouldn't necessarily help
It feels like there's probably another way to do what you require that ditches this single column table entirely

cardinality estimation when the foreign key is limited to a small subset

Recently I have been optimizing the performance of a large scale ERP packet.
One of the performance issues I haven't been able to solve involves bad cardinality estimation for a foreign key which is limited to a very small subset of records from a large table.
Table A holds 3 mil records and has a type field
Table B holds 7 mil records and holds a foreign key FK to Table A
The foreign key will only be filled with primary keys from table A with a certain type, only 36 from the 3 mil records in Table A have this certain type.
B JOIN A ON B.FK = A.PK AND A.TYPE = X AND A.Name = Y
Now using the correct statistics SQL knows table A will only return 1 value
But SQL estimates only 2 records will be returned from table B (my guess is 7 mil / 3 mil) while actually 930 000 records are returned
This results in a slow query plan being used.
The real query is more complex but the cause of the bad query plan is because of this simplified statement.
Our DB does have accurate statistics for the FK (histogram shows EQ_Rows for every distinct value of this FK) and filtering on a fixed FK value does result in accurate estimations.
Is there any way to show SQL that this FK can only hold a small amount of distinct values or in any other way help him with the estimations.
If we had a chance we would split up the table and put these 36 records in a separate table but unfortunately this is how the ERP system works.
Extra info:
We are using SQL 2014.
The ERP system is Dynamics AX 2012 R3
Using trace flag 9481 does help (not perfect but a lot better) but unfortunately we cannot set trace flags for separate queries with Dynamics AX

I've encountered these kinds of problems before, and have often found that I can dramatically reduce total run time for a stored proc or script by pulling those 'very few relevant rows' from a large table into a small temp table and then joining that temp table into the main query later. Or using CTE queries to isolate the few needed rows. A little experimentation should quickly tell you if this has potential in your case.

Look at the query plan
Clearly you want it to filter on TYPE early
It is probably doing loop join
FROM B
JOIN A
ON B.FK = A.PK
AND A.TYPE = X
AND A.Name = Y
Try the various join hints
Next would be to create a #temp and join to it
Declare a PK on you temp

SQL Query very slow - Suddenly

I have a SQL stored procedure that was running perfect (.2 secs execution or less), suddenly today its taking more than 10 minutes.
I see that the issue comes because of a LEFT JOIN of a documents table (that stores the location of all the digital files associated to records in the DB).
This documents table has today 153,234 records.
The schema is
The table has 2 indexes:
Primary key (uid)
documenttype (nonclustered)
The stored procedure is:
SELECT
.....,
CASE ISNULL(cd.countdocs,0) WHEN 0 THEN 0 ELSE 1 END as hasdocs
.....
FROM
requests re
JOIN
employee e ON (e.employeeuid = re.employeeuid)
LEFT JOIN
(SELECT
COUNT(0) as countnotes, n.objectuid as objectuid
FROM
notes n
WHERE
n.isactive = 1
GROUP BY
n.objectuid) n ON n.objectuid = ma.authorizationuid
/* IF I COMMENT THIS LEFT JOIN THEN WORKS AMAZING FAST */
LEFT JOIN
(SELECT
COUNT(0) as countdocs, cd.objectuid
FROM
cloud_document cd
WHERE
cd.isactivedocument = 1
AND cd.entity = 'COMPANY'
GROUP BY
cd.objectuid) cd ON cd.objectuid = re.authorizationuid
JOIN ....
So don't know if I have to add another INDEX to improve this query of maybe the LEFT JOIN I have is not ideal.
If I run the execution plan I get this:
/*
Missing Index Details from SQLQuery7.sql - (local).db_prod (Test/test (55))
The Query Processor estimates that implementing the following index could improve the query cost by 60.8843%.
*/
/*
USE [db_prod]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[cloud_document] ([objectuid],[entity],[isactivedocument])
GO
*/
Any clue on how to solve this?
Thanks.

Just don't go out and add a index. Do some research first!
Can you grab a picture of the query plan and post it? It will show if the query is using the index or not.
Also, complete details of the table would be awesome, including Primary Keys, Foreign Keys, and any indexes. Just script them out to TSQL. A couple of sample records to boot and we can recreate it in a test environment and help you.
Also, take a look at Glenn Barry's DMVs.
http://sqlserverperformance.wordpress.com/tag/dmv-queries/
Good stuff like top running queries, read/write usages of indexes, etc - to name a few.
Like many things in life, it all depends on your situation!
Just need more information before we can make a judgement call.

I would actually be surprised if an index on that field helps as it likely only has two or three values (0,1, null) and indexes are not generally useful when the data has so few values.
I would suspect that either your statistics are out of date or your current indexes need to be rebuilt.

Why does this SQL query take 8 hours to finish?

There is a simple SQL JOIN statement below:
SELECT
REC.[BarCode]
,REC.[PASSEDPROCESS]
,REC.[PASSEDNODE]
,REC.[ENABLE]
,REC.[ScanTime]
,REC.[ID]
,REC.[Se_Scanner]
,REC.[UserCode]
,REC.[aufnr]
,REC.[dispatcher]
,REC.[matnr]
,REC.[unitcount]
,REC.[maktx]
,REC.[color]
,REC.[machinecode]
,P.PR_NAME
,N.NO_NAME
,I.[inventoryID]
,I.[status]
FROM tbBCScanRec as REC
left join TB_R_INVENTORY_BARCODE as R
ON REC.[BarCode] = R.[barcode]
AND REC.[PASSEDPROCESS] = R.[process]
AND REC.[PASSEDNODE] = R.[node]
left join TB_INVENTORY as I
ON R.[inventid] = I.[id]
INNER JOIN TB_NODE as N
ON N.NO_ID = REC.PASSEDNODE
INNER JOIN TB_PROCESS as P
ON P.PR_CODE = REC.PASSEDPROCESS
The table tbBCScanRec has 556553 records while the table TB_R_INVENTORY_BARCODE has 260513 reccords and the table TB_INVENTORY has 7688. However, the last two tables (TB_NODE and TB_PROCESS) both have fewer than 30 records.
Incredibly, when it runs in SQL Server 2005, it takes 8 hours to return the result set.
Why does it take so much time to execute?
If the two inner joins are removed, it takes just ten seconds to finish running.
What is the matter?
There are at least two UNIQUE NONCLUSTERED INDEXes.
One is IX_INVENTORY_BARCODE_PROCESS_NODE on the table TB_R_INVENTORY_BARCODE, which covers four columns (inventid, barcode, process, and node).
The other is IX_BARCODE_PROCESS_NODE on the table tbBCScanRec, which covers three columns (BarCode, PASSEDPROCESS, and PASSEDNODE).

Well, standard answer to questions like this:
Make sure you have all the necessary indexes in place, i.e. indexes on N.NO_ID, REC.PASSEDNODE, P.PR_CODE, REC.PASSEDPROCESS
Make sure that the types of the columns you join on are the same, so that no implicit conversion is necessary.

You are working with around (556553 *30 *30) 500 millions of rows.
You probably have to add indexes on your tables.
If you are using SQL server, you can watch the plan query to see where you are losing time.
See the documentation here : http://msdn.microsoft.com/en-us/library/ms190623(v=sql.90).aspx
The query plan will help you to create indexes.

When you check the indexing, there should be clustered indexes as well - the nonclustered indexes use the clustered index, so not having one would render the nonclustered useless. Out-dated statistics could also be a problem.
However, why do you need to fetch ALL of the data? What is the purpose of that? You should have WHERE clauses restricting the result set to only what you need.

Getting record differences between 2 nearly identical tables

I have a process that consolidates 40+ identically structured databases down to one consolidated database, the only difference being that the consolidated database adds a project_id field to each table.
In order to be as efficient as possible, I'm try to only copy/update a record from the source databases to the consolidated database if it's been added/changed. I delete outdated records from the consolidated database, and then copy in any non-existing records. To delete outdated/changed records I'm using a query similar to this:
DELETE FROM <table>
WHERE NOT EXISTS (SELECT <primary keys>
FROM <source> b
WHERE ((<b.fields = a.fields>) or
(b.fields is null and a.fields is null)))
AND PROJECT_ID = <project_id>
This works for the most part, but one of the tables in the source database has over 700,000 records, and this query takes over an hour to complete.
How can make this query more efficient?

Use timestamps or better yet audit tables to identify the records that changed since time "X" and then save time "X" when last sync started. We use that for interface feeds.

You might want to try LEFT JOIN with NULL filter:
DELETE <table>
FROM <table> t
LEFT JOIN <source> b
ON (t.Field1 = b.Field1 OR (t.Field1 IS NULL AND b.Field1 IS NULL))
AND(t.Field2 = b.Field2 OR (t.Field2 IS NULL AND b.Field2 IS NULL))
--//...
WHERE t.PROJECT_ID = <project_id>
AND b.PrimaryKey IS NULL --// any of the PK fields will do, but I really hope you do not use composite PKs
But if you are comparing all non-PK columns, then your query is going to suffer.
In this case it is better to add a UpdatedAt TIMESTAMP field (as DVK suggests) on both databases which you could update with the AFTER UPDATE trigger, then your sync procedure would be much faster, given that you create an index including PKs and UpdatedAt column.

You can reorder the WHERE statement; it has four comparisons, put the one most likely to fail first.
If you can alter the databases/application slightly, and you'll need to do this again, a bit field that says "updated" might not be a bad addition.

I usually rewrite queries like this to avoid the not...
Not In is horrible for performance, although Not Exists improves on this.
Check out this article, http://www.sql-server-pro.com/sql-where-clause-optimization.html
My suggestion...
Select out your pkey column into a working/temp table, add a column (flag) int default 0 not null, and index the pkey column. Mark flag =1 if record exists in your subquery (much quicker!).
Replace your sub select in your main query with an exists where (select pkey from temptable where flag=0)
What this works out to is being able to create a list of 'not exists' values that can be used inclusively from an all inclusive set.
Here's our total set.
{1,2,3,4,5}
Here's the existing set
{1,3,4}
We create our working table from these two sets (technically a left outer join)
(record:exists)
{1:1, 2:0, 3:1, 4:1, 5:0}
Our set of 'not existing records'
{2,5} (Select * from where flag=0)
Our product... and much quicker (indexes!)
{1,2,3,4,5} in {2,5} = {2,5}
{1,2,3,4,5} not in {1,3,4} = {2,5}
This can be done without a working table, but its use makes visualizing what's happening easier.
Kris

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas