Improve Speed of Script with Indexing - sql

I have the following script that used to be ok, but since our user base has now expanded to almost a million members, the script is now very sluggish. I want to improve it and need expert assistance to make this faster, with either coding changes, create indexes, or both. Here is the code:
IF #MODE = 'CREATEREQUEST'
BEGIN
IF NOT EXISTS (SELECT * FROM FriendRequest WHERE FromMemberID = #FromMemberID AND ToMemberID = #ToMemberID)
AND NOT EXISTS (SELECT * FROM MemberConnection WHERE MemberID = #FromMemberID AND ConnMemberID = #ToMemberID)
AND NOT EXISTS (SELECT * FROM MemberConnection WHERE MemberID = #ToMemberID AND ConnMemberID = #FromMemberID)
BEGIN
INSERT INTO FriendRequest (
FromMemberID,
ToMemberID,
RequestMsg,
OnHold)
VALUES (
#FromMemberID,
#ToMemberID,
#RequestMsg,
#OnHold)
END
BEGIN
UPDATE Member SET FriendRequestCount = (FriendRequestCount + 1) WHERE MemberID = #ToMemberID
END
END
Any assistance you can provide would be greatly appreciated.

You can use SQL Server Management Studio to view the indexes on a table. If, for example, your FriendRequest table has a PK on FriendRequestID, you will be able to see that you have a clustered index on that field. You can have only one clustered index per table, and the table records are stored in that order.
You might want to try adding non-clustered indexes to your foreign key fields. You could use the New Index wizard, or else syntax like this:
CREATE NONCLUSTERED INDEX [IX_FromMemberID] ON [dbo].[FriendRequest] (FromMemberID)
CREATE NONCLUSTERED INDEX [IX_ToMemberID] ON [dbo].[FriendRequest] (ToMemberID)
But you should be aware that indexing will generally slow down the INSERT and UPDATE operations you showed in your code. It will generally tend to speed up the SELECT queries that can use the indexed fields (see Execution Plans).
You can try the Database Engine Tuning Advisor to get an idea of some possible indexes and their effect on your application's workload.
Indexing is a large subject and you may wish to take it a small step at a time.

Related

SQL Query on table with 30mill records

I have been having problems building a table in my local SQL Server. Orginally it was causing the tempdb table to become full and throw an exception. This has a lot of joins and outer applies, and so to find specifically where the problem lay I did a select on the first table in the sql query to determine how long it took, that was fast so I then added the next table that was the first join in the query and reran, I continued to do this until I found the table that stalled.
I found the problem (or at least the first problem) was with the shipper_container table. This table is huge and pulling it alone gets a System.OutOfMemoryException just showing a select on the results of that table alone (it has only 5 columns). It cuts out at 16 million records but has 30 million rows. It is 1.2GB in size. This doesn't seem so big for me that SQL Management studio couldn't handle it.
Using a WHERE statement to collect values between 1 January - 10 January 2015 still resulted in a search that took over 5 minutes and was still executing when I cancelled. I have also added indexes on each of the select parameters and this did not increase performance either.
Here is the SQL Query. You can see I have commented out the other parameters that have yet to be added in other joins and outer applies.
DECLARE #startDate DATETIME
DECLARE #endDate DATETIME
DECLARE #Shipper_Key INT = NULL
DECLARE #Part_Key INT = NULL
SET #startDate = '2015-01-01'
SET #endDate = '2015-01-10'
SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
INSERT Shipped_Container
(
Ship_Date,
Invoice_Quantity,
Shipper_No,
Serial_No,
Truck_Key,
Shipper_Key
)
SELECT
S.Ship_Date,
SC.Quantity,
S.Shipper_No,
SC.Serial_No,
S.Truck_Key,
S.Shipper_Key
FROM Shipper AS S
JOIN Shipper_Line AS SL
--ON SL.PCN = S.PCN
ON SL.Shipper_Key = S.Shipper_Key
JOIN Shipper_Container AS SC
--ON SC.PCN = SL.PCN
ON SC.Shipper_Line_Key = SL.Shipper_Line_Key
WHERE S.Ship_Date >= #startDate AND S.Ship_Date <= #endDate
AND S.Shipper_Key = ISNULL(#Shipper_Key, S.Shipper_Key)
AND SL.Part_Key = ISNULL(#Part_Key, SL.Part_Key)
The server instance is run on the local network - could this be an issue? I really have minimal experience at this and would really appreciate help and as detailed and clear as possible. Often in SQL forums people jump right into technical details I don't follow so well.
Don't do a Select ... From yourtable in SS Management Studio when it return
hundrend of thousand or millions of row. 1GB of data gets a lot bigger when the system has to draw and show it on screen in the Management Studio data sheet
The server instance is run on the local network
When you do a Select ... From yourtable in SSMS, the server must send all the data to your laptop/desktop. This is quite a lot of uneeded presure on the network.
It should not be an issue when you insert because everything stays on the server. However, staying on the server does not mean it will be fast if your data model is not good enough.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
You may get dirty data is you use that... It may be better to remove it unless you know why it is there and why you need it.
I have also added indexes on each of the select parameters and this did not increase performance either
If you mean indexes on :
S.Ship_Date,
SC.Quantity,
S.Shipper_No,
SC.Serial_No,
S.Truck_Key,
S.Shipper_Key
What are their definitions ?
If they are individual indexes on 1 column, you can drop indexes on SC.Quantity, S.Shipper_No, SC.Serial_No and S.Truck_Key. They are not used.
Ship_Date and Shipper_key may be usefull. It all depends on your model and existing primary keys. (which you need to describe, see below)
It will help to give a more accurate answer if you could tell us:
the relation between your 3 tables (which field link A to B and in which direction)
the primary key on your 3 tables
a complete list of all your indexes(and columns) on your 3 tables
If none of your indexes are usefull or if they are missing, it will most likely read the whole 3 tables and try to match them. Because it is pretty big, it does not have enough memory to process it and it uses tempdb to store intermediary data.
For now I will suppose that shipper_key + PCN is the primary key on each tables.
I think you can try that:
You can create an index on S.Ship_Date
Create Index Shipper_Line_Ship_Date(Ship_Date) -- subject to updates according to your Primary Key
The query optimizer may not use the indexes (if they exists) with such a where clause:
AND S.Shipper_Key = ISNULL(#Shipper_Key, S.Shipper_Key)
AND SL.Part_Key = ISNULL(#Part_Key, SL.Part_Key)
you can use:
AND (S.Shipper_Key = #Shipper_Key or #Shipper_Key is null)
AND (SL.Part_Key = #Part_Key or #Part_Keyis null)
It would help to have indexes on Shipper_Key and PCN
Finally
As I already said above, we need to know more about your data model (create table...), primary keys and indexes (create indexes). You can create a modele here http://sqlfiddle.com/ with all 3 create tables and their indexes. Then go to link and add the link here.
In SSMS, you can right click on a table and go to Script Table as / Create To / New Query Window and add it here or in http://sqlfiddle.com/. Only keep the CREATE TABLE ... part down to the first GO.
You can then do the same thing on all you indexes.
You should also add a copy of you query plan.
In SSMS, go to Query menu / Display Estimated Execution Plan and right click to save it as xml (xml is better). It is only an estimation and it won't execute the whole query. It should be pretty fast.

SQL Query very slow - Suddenly

I have a SQL stored procedure that was running perfect (.2 secs execution or less), suddenly today its taking more than 10 minutes.
I see that the issue comes because of a LEFT JOIN of a documents table (that stores the location of all the digital files associated to records in the DB).
This documents table has today 153,234 records.
The schema is
The table has 2 indexes:
Primary key (uid)
documenttype (nonclustered)
The stored procedure is:
SELECT
.....,
CASE ISNULL(cd.countdocs,0) WHEN 0 THEN 0 ELSE 1 END as hasdocs
.....
FROM
requests re
JOIN
employee e ON (e.employeeuid = re.employeeuid)
LEFT JOIN
(SELECT
COUNT(0) as countnotes, n.objectuid as objectuid
FROM
notes n
WHERE
n.isactive = 1
GROUP BY
n.objectuid) n ON n.objectuid = ma.authorizationuid
/* IF I COMMENT THIS LEFT JOIN THEN WORKS AMAZING FAST */
LEFT JOIN
(SELECT
COUNT(0) as countdocs, cd.objectuid
FROM
cloud_document cd
WHERE
cd.isactivedocument = 1
AND cd.entity = 'COMPANY'
GROUP BY
cd.objectuid) cd ON cd.objectuid = re.authorizationuid
JOIN ....
So don't know if I have to add another INDEX to improve this query of maybe the LEFT JOIN I have is not ideal.
If I run the execution plan I get this:
/*
Missing Index Details from SQLQuery7.sql - (local).db_prod (Test/test (55))
The Query Processor estimates that implementing the following index could improve the query cost by 60.8843%.
*/
/*
USE [db_prod]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[cloud_document] ([objectuid],[entity],[isactivedocument])
GO
*/
Any clue on how to solve this?
Thanks.
Just don't go out and add a index. Do some research first!
Can you grab a picture of the query plan and post it? It will show if the query is using the index or not.
Also, complete details of the table would be awesome, including Primary Keys, Foreign Keys, and any indexes. Just script them out to TSQL. A couple of sample records to boot and we can recreate it in a test environment and help you.
Also, take a look at Glenn Barry's DMVs.
http://sqlserverperformance.wordpress.com/tag/dmv-queries/
Good stuff like top running queries, read/write usages of indexes, etc - to name a few.
Like many things in life, it all depends on your situation!
Just need more information before we can make a judgement call.
I would actually be surprised if an index on that field helps as it likely only has two or three values (0,1, null) and indexes are not generally useful when the data has so few values.
I would suspect that either your statistics are out of date or your current indexes need to be rebuilt.

How to optimise slow SQL query

I need a help to optimise this query. In stored procedure this part is executed for 1 hour (all procedure need 2 to execute). Procedure works for a large amount of data. Query works with two temporary tables. Both use indexes:
create unique clustered index #cx_tDuguje on #tDuguje (Partija, Referenca, Konto, Valuta, DatumValute)
create nonclustered index #cx_tDuguje_1 on #tDuguje (Partija, Valuta, Referenca, Konto, sIznos)
create unique clustered index #cx_tPotrazuje on #tPotrazuje (Partija, Referenca, Konto, Valuta, DatumValute)
create nonclustered index #cx_tPotrazuje_1 on #tPotrazuje (Partija, Valuta, Referenca, Konto, pIznos)
And this is a query:
select D.Partija,
D.Referenca,
D.Konto,
D.Valuta,
D.DatumValute DatumZad,
NULLIF(MAX(COALESCE(P.DatumValute,#NextDay)), ,#NextDay) DatumUpl,
MAX(D.DospObaveze) DospObaveze,
MAX(D.LimitMatZn) LimitMatZn
into #dwkKasnjenja_WNT
from #tDuguje D
left join #tPotrazuje P on D.Partija = P.Partija
AND D.Valuta = p.Valuta
AND D.Referenca = p.Referenca
AND D.Konto = P.Konto
and P.pIznos < D.sIznos and D.sIznos <= P.Iznos
WHERE 1=1
AND D.DatumValute IS NOT NULL
GROUP BY D.Partija, D.Referenca, D.Konto, D.Valuta, D.DatumValute
I have and Execution plan, but i am not enabled to post it here.
Just an idea: If you are permitted to do so, try to change the business logic first.
Maybe you constrain the result set only to include data from, say, a point in time back that is meaningful.
How far back in time do your account data reach? Do you really need to include all data all the way back from good old 1999?
Maybe you can say
D.DatumValute >= "Jan 1 2010"
or similar, in your WHERE clause
And this might create a much smaller temporary result set that is used in your complicated JOIN clause which will then run faster.
If you can't do this, maybe do a
"select top 1000 ... order by datum desc" query, which might run faster, and then if the user really needs to , perfomr the slow running query in a second step.
Difficult to say without haging an execution plan or some hints about number of rows in each table.
Replace this index
create nonclustered index #cx_tPotrazuje_1 on #tPotrazuje (Partija, Valuta, Referenca, Konto, pIznos)
by
create nonclustered index #cx_tPotrazuje_1 on #tPotrazuje (Partija, Valuta, Referenca, Konto, sIznos, pIznos)
The creation of indexes is a very expensive process and it could be slow sometimes, according also to the workload of the instance and to the columns involved, for which an index is created.
Furthermore, it's very difficult to say what you need to optimise, without an execution plan and without know something about the data types of the columns involved in the indexes creation.
For example, the data types of the columns Partija, Referenca, Konto, Valuta, DatumValute are not so clear.
You should tell us the data types of the columns involved in the creation of your indexes.

Selecting the most optimal query

I have table in Oracle database which is called my_table for example. It is type of log table. It has an incremental column which is named "id" and "registration_number" which is unique for registered users. Now I want to get latest changes for registered users so I wrote queries below to accomplish this task:
First version:
SELECT t.*
FROM my_table t
WHERE t.id =
(SELECT MAX(id) FROM my_table t_m WHERE t_m.registration_number = t.registration_number
);
Second version:
SELECT t.*
FROM my_table t
INNER JOIN
( SELECT MAX(id) m_id FROM my_table GROUP BY registration_number
) t_m
ON t.id = t_m.m_id;
My first question is which of above queries is recommended and why? And second one is if sometimes there is about 70.000 insert to this table but mostly the number of inserted rows is changing between 0 and 2000 is it reasonable to add index to this table?
An analytical query might be the fastest way to get the latest change for each registered user:
SELECT registration_number, id
FROM (
SELECT
registration_number,
id,
ROW_NUMBER() OVER (PARTITION BY registration_number ORDER BY id DESC) AS IDRankByUser
FROM my_table
)
WHERE IDRankByUser = 1
As for indexes, I'm assuming you already have an index by registration_number. An additional index on id will help the query, but maybe not by much and maybe not enough to justify the index. I say that because if you're inserting 70K rows at one time the additional index will slow down the INSERT. You'll have to experiment (and check the execution plans) to figure out if the index is worth it.
In order to check for faster query, you should check the execution plan and cost and it will give you a fair idea. But i agree with solution of Ed Gibbs as analytics make query run much faster.
If you feel this table is going to grow very big then i would suggest partitioning the table and using local indexes. They will definitely help you to form faster queries.
In cases where you want to insert lots of rows then indexes slow down insertion as with each insertion index also has to be updated[I will not recommend index on ID]. There are 2 solutions i have think of for this:
You can drop index before insertion and then recreate it after insertion.
Use reverse key indexes. Check this link : http://oracletoday.blogspot.in/2006/09/there-is-option-to-create-index.html. Reverse key index can impact your query a bit so there will be trade off.
If you look for faster solution and there is a really need to maintain list of last activity for each user, then most robust solution is to maintain separate table with unique registration_number values and rowid of last record created in log table.
E.g. (only for demo, not checked for syntax validity, sequences and triggers omitted):
create table my_log(id number not null, registration_number number, action_id varchar2(100))
/
create table last_user_action(refgistration_number number not null, last_action rowid)
/
alter table last_user_action
add constraint pk_last_user_action primary key (registration_number) using index
/
create or replace procedure write_log(p_reg_num number, p_action_id varchar2)
is
v_row_id rowid;
begin
insert into my_log(registration_number, action_id)
values(p_reg_num, p_action_id)
returning rowid into v_row_id;
update last_user_action
set last_action = v_row_id
where registration_number = p_reg_num;
end;
/
With such schema you can simple query last actions for every user with good performance:
select
from
last_user_action lua,
my_log l
where
l.rowid (+) = lua.last_action
Rowid is physical storage identity directly addressing storage block and you can't use it after moving to another server, restoring from backups etc. But if you need such functionality it's simple to add id column from my_log table to last_user_action too, and use one or another depending on requirements.

SQL Deadlocks Insert Update statments in Stored Procedure

I have a very basic table called Titles as below,
TitleID - auto identity and PK
UserID - reference key to User table
Title - varchar
IsPrimary - bit
Only one index which is PK Clustered Index on TitleID
Now I'm inserting records in this table via stored procedure in a ReadCommitted transaction,
This stored procedure inserts the record in a table with IsPrimary = 1 and update all other titles to 0
INSERT INTO Titles(...)
VALUES (...)
UPDATE T
SET IsPrimary = 0
FROM Titles T
WHERE T.UserID = #UserID AND T.JobTitle != #Title
The moment I test this in multi user scenario I hit deadlock issues. If I remove the UPDATE command from stored proc then everything works perfectly fine...
I tried to create non clustered indexes on the lookup column and also tried WITH (ROWLOCK) hint in update statement but nothing seems to work.
When I ran sql statements and viewed the estimated execution plan I can see both update the clustered index and I'm thinking this is where it fails during multiple user scenario...
I believe it's fairly simple scenario and lots of people should have implemented this kind of behaviour in high transaction system but I can't finding anything on how to approach/solve this issue and your help will be appreciated.
Thank you.
Your deadlock is probably on the index resource.
In the execution plan look for bookmark/key lookups and create a non-clustered index covering those fields - that way the 'read' of the data for the UPDATE will not clash with the 'write' of the INSERT.