How can I improve the performance of this stored procedure? - sql

Okay so I made some changes to a stored procedure that we have, and it now takes 3 hours to run (it used to only take 10 minutes before). I have a temp table called #tCustomersEmail. In it, is a column called OrderDate, which has a lot of null values in it. I want to replace those null values with data from another database on a different server. So here's what I have:
I create another temp table:
Create Table #tSouth
(
CustID char(10),
InvcDate nchar(10)
)
Which I populate with this data:
INSERT INTO #tSouth(CustID, InvcDate)
SELECT DISTINCT
[CustID],
max(InvcDate) as InvcDate
FROM D3.SouthW.dbo.uc_InvoiceLine I
where EXISTS (SELECT CustomerNumber FROM #tCustomersEmail H WHERE I.CustID = H.CustomerNumber)
group BY I.CustID
Then I take the data from #tSouth and update the OrderDate in the #tCustomersEmail table, as long as the CustomerNumber matches up, and the OrderDate is null:
UPDATE #tCustomersEmail
SET OrderDate = InvcDate
FROM #tCustomersEmail
INNER JOIN #tSouth ON #tCustomersEmail.CustomerNumber = [#tSouth].CustID
where #tCustomersEmail.OrderDate IS null
Making those changes caused the stored procedure to take FOR-EV-ER (Sandlot reference!)
So what am I doing wrong?
BTW I create indexes on my temp tables after I create them like so:
create clustered index idx_Customers ON #tCustomersEmail(CustomerNumber)
CREATE clustered index idx_CustSouthW ON #tSouth(CustID)

Try skipping the #tsouth table and use this query:
UPDATE a
SET OrderDate = (select max(InvcDate) from D3.SouthW.dbo.uc_InvoiceLine I
where a.customernumber = custid)
FROM #tCustomersEmail a
WHERE orderdate is null
I don't think the index will help you in this example

Maybe use table variable instead of temp table?
declare #temp table
(
CustID char(10),
InvcDate nchar(10)
)
insert into #temp
...
That definitely will increase the performance!

Distinct isn't needed if you have a GROUP BY. Given you are going across database I don't like the EXISTS. I would change that part to limit the number of rows at that point. Change to:
INSERT INTO #tSouth(CustID, InvcDate)
SELECT
[CustID],
max(InvcDate) as InvcDate
FROM D3.SouthW.dbo.uc_InvoiceLine I
where I.CustID in
(SELECT CustomerNumber
FROM #tCustomersEmail H
WHERE H.OrderDate IS null )
group BY I.CustID
EDIT: Looking closer are you sure uc_InvoiceLine should be used? Looks like there should be a parent table to that one that would had the date and have fewer rows.
Also, you can skip the one temp table by doing the update directly:
UPDATE #tCustomersEmail
SET OrderDate = InvcDate
FROM #tCustomersEmail
INNER JOIN (SELECT
[CustID],
max(InvcDate) as InvcDate
FROM D3.SouthW.dbo.uc_InvoiceLine I
where I.CustID in
(SELECT CustomerNumber
FROM #tCustomersEmail H
WHERE H.OrderDate IS null )
group BY I.CustID) Invoices
ON #tCustomersEmail.CustomerNumber = Invoices.CustID

It's difficult to predict the behaviour of complex queries involving tables on a linked server, because the local server has no access to statistics for the remote table, and can end up with a poor query plan because of this - it will work on the assumption that the remote table has either 1 or 100 rows.
If this wasn't bad enough, the result of the bad plan can be to pull the entire remote table over the wire into local temp space and work on it there. If the remote table is very large, this can be a major performance overhead.
In might be worth trying to simplify the linked server query to minimise the chances of the entire table being returned over the wire - (as has already been mentioned, you don't need both DISTINCT and GROUP BY)
INSERT INTO #tSouth(CustID, InvcDate)
SELECT [CustID],
max(InvcDate) as InvcDate
FROM D3.SouthW.dbo.uc_InvoiceLine I
group BY I.CustID
leaving the rest of the query unchanged.
However, because of the aggregate this may still bring the whole table back to the local server - you'll need to test to find out. Your best bet may be to encapsulate this logic in a view in the SouthW database, if you're able to create objects in it, then reference that from your SP code.

Related

How to use Join with like operator and then casting columns

I have 2 tables with these columns:
CREATE TABLE #temp
(
Phone_number varchar(100) -- example data: "2022033456"
)
CREATE TABLE orders
(
Addons ntext -- example data: "Enter phone:2022033456<br>Thephoneisvalid"
)
I have to join these two tables using 'LIKE' as the phone numbers are not in same format. Little background I am joining the #temp table on the phone number with orders table on its Addons value. Then again in WHERE condition I am trying to match them and get some results. Here is my code. But my results that I am getting are not accurate. As its not returning any data. I don't know what I am doing wrong. I am using SQL Server.
select
*
from
order_no as n
join
orders as o on n.order_no = o.order_no
join
#temp as t on t.phone_number like '%'+ cast(o.Addons as varchar(max))+'%'
where
t.phone_number = '%' + cast(o.Addons as varchar(max)) + '%'
You can not use LIKE statement in the JOIN condition. Please provide more information on your tables. You have to convert the format of one of the phone field to compile with other phone field format in order to join.
I think your join condition is in the wrong order. Because your question explicitly mentions two tables, let's stick with those:
select *
from orders o JOIN
#temp t
on cast(o.Addons as varchar(max)) like '%' + t.phone_number + '%';
It has been so long since I dealt with the text data type (in SQL Server), that I don't remember if the cast() is necessary or not.
Instead of trying to do everything in a single top-level query, you should apply a transformation projection to your orders table and use that as a subquery, which will make the query easier to understand.
Using the CHARINDEX function will make this a lot easier, however it does not support ntext, you will need to change your schema to use nvarchar(max) instead - which you should be doing anyway as ntext is deprecated, fortunately you can use CONVERT( nvarchar(max), someNTextValue ), though this will reduce performance as you won't be able to use any indexes on your ntext values - but this query will run slowly anyway.
SELECT
orders2.*,
CASE WHEN orders2.PhoneStart > 0 AND orders2.PhoneEnd > 0 THEN
SUBSTRING( orders2.Addons, orders2.PhoneStart, orders2.PhoneEnd - orders2.PhoneStart )
ELSE
NULL
END AS ExtractedPhoneNumber
FROM
(
SELECT
orders.*, -- never use `*` in production, so replace this with the actual columns in your orders table
CHARINDEX('Enter phone:', Addons) AS PhoneStart,
CHARINDEX('<br>Thephoneisvalid', AddOns, CHARINDEX('Enter phone:', Addons) ) AS PhoneEnd
FROM
orders
) AS orders2
I suggest converting the above into a VIEW or CTE so you can directly query it in your JOIN expression:
CREATE VIEW ordersWithPhoneNumbers AS
-- copy and paste the above query here, then execute the batch to create the view, you only need to do this once.
Then you can use it like so:
SELECT
* -- again, avoid the use of the star selector in production use
FROM
ordersWithPhoneNumbers AS o2 -- this is the above query as a VIEW
INNER JOIN order_no ON o2.order_no = order_no.order_no
INNER JOIN #temp AS t ON o2.ExtractedPhoneNumber = t.phone_number
Actually, I take back my previous remark about performance - if you add an index to the ExtractedPhoneNumber column of the ordersWithPhoneNumbers view then you'll get good performance.

T-SQL Stored Procedure: Performance of select count(*) vs. select count([uniqueId])

So, I'm looking at a stored procedure here, which has more than one line like the following pseudocode:
if(select count(*) > 0)
...
on tables having a unique id (or identifier, for making it more general).
Now, in terms of performance, is it more performant to change this clause
to
if(select count([uniqueId]) > 0)
...
where uniqueId is, e.g., an Idx containing double values?
An example:
Consider a table like Idx (double) | Name (String) | Address (String)
Now the 'Idx' is a foreign key which I want to join in a stored procedure.
So, in terms of performance: what is better here?
if(select count(*) > 0)
...
or
if(select count(Idx) > 0)
...
? Or does the SQL Engine Change select count(*) to select count(Idx) internally, so we do not have to bother about this? Because at first sight, I'd say that select count(Idx) would be more performant.
The two are slightly different. count(*) counts rows. count([uniqueid]) counts the number of non-NULL values for uniqueid. Because a unique constraint allows a NULL value, SQL Server actually needs to read the column. This could add microseconds of time to a query, particularly if the page with the id is not already in memory. This also gives SQL Server more opportunities to optimize count(*).
As #lad2025 writes in a comment, the performant solution is to use if (exists . . ..
SELECT t1.*
FROM Table1 t1
JOIN Table2 t2 ON t2.idx = t1.idx
will give you only the rows in t1 that match an idx value in Table2. I'm not sure there is a good reason to do an if(select count...).
If you are really interested in the performance of something like this, just create a temp table with a million rows and give it a go:
CREATE TABLE #TempTable (id int identity, txt varchar(50))
GO
INSERT #TempTable (txt) VALUES (##IDENTITY)
GO 1000000

SQL returns all rows if "IN" subquery fails

So recently I caused myself a lot of trouble by coding to fast, not testing my code very well, and publishing a stored procedure with an error the compiler doesn't catch. I have a query like this:
--Lets say I have a table SomeTable (SomeId, CreateDate, UpdatedDate)
DECLARE #Temp TABLE (Id INT);
INSERT INTO #Temp (Id)
SELECT SomeId FROM SomeTable WHERE CreateDate > SomeDate;
UPDATE SomeTable
SET UpdatedDate = GETDATE()
WHERE SomeId IN (SELECT TempId FROM #Temp);
Now the compiler didn't catch that I was trying to select a column in my sub query that didn't exist in my temp table (which I get). However, in my mind if the sub query fails in this case wouldn't it make sense that no rows should be returned. When I ran this EVERY ROW EVER in SomeTable was updated. Which, as you can imagine, was a huge pain to fix. Does any one have a clue why it does this? I can't seem to find the answer online.
You think you wrote this query:
UPDATE SomeTable s
SET UpdatedDate = GETDATE()
WHERE s.SomeId IN (SELECT t.TempId FROM #Temp t);
But, if t.TempId doesn't exist, then SQL Server (and all other databases) interpret this as:
UPDATE SomeTable s
SET UpdatedDate = GETDATE()
WHERE s.SomeId IN (SELECT s.TempId FROM #Temp t);
If you are getting all rows, then there is a column SomeTable.TempId that has the same value as SomeTable.SomeId. My guess is that you used the same column name in the two places. So, your query is essentially SomeTable.SomeId = SomeId -- which is true whenever SomeId is not NULL.
If you follow the bets practice of always qualifying your columns names with a table alias, then you would not have had this problem. If you had done that, then the first version of the query would have failed, probably with an intelligible error.

Fast calculation of partial sums on a large SQL Server table

I need to calculate a total of a column up to a specified date on a table that currently has over 400k rows and is poised to grow further. I found the SUM() aggregate function to be too slow for my purpose, as I couldn't get it faster than about 1500ms for a sum over 50k rows.
Please note that the code below is the fastest implementation I have found so far. Notably filtering the data from CustRapport and storing it in a temporary table brought me a 3x performance increase. I also experimented with indexes, but they usually made it slower.
I would however like the function to be at least an order of magnitude faster. Any idea on how to achieve that? I have stumbled upon http://en.wikipedia.org/wiki/Fenwick_tree. However, I would rather have the storage and calculation processed within SQL Server.
CustRapport and CustLeistung are Views with the following definition:
ALTER VIEW [dbo].[CustLeistung] AS
SELECT TblLeistung.* FROM TblLeistung
WHERE WebKundeID IN (SELECT WebID FROM XBauAdmin.dbo.CustKunde)
ALTER VIEW [dbo].[CustRapport] AS
SELECT MainRapport.* FROM MainRapport
WHERE WebKundeID IN (SELECT WebID FROM XBauAdmin.dbo.CustKunde)
Thanks for any help or advice!
ALTER FUNCTION [dbo].[getBaustellenstunden]
(
#baustelleID int,
#datum date
)
RETURNS
#ret TABLE
(
Summe float
)
AS
BEGIN
declare #rapport table
(
id int null
)
INSERT INTO #rapport select WebSourceID from CustRapport
WHERE RapportBaustelleID = #baustelleID AND RapportDatum <= #datum
INSERT INTO #ret
SELECT SUM(LeistungArbeit)
FROM CustLeistung INNER JOIN #rapport as r ON LeistungRapportID = r.id
WHERE LeistungArbeit is not null
AND LeistungInventarID is null AND LeistungArbeit > 0
RETURN
END
Execution plan:
http://s23.postimg.org/mxq9ktudn/execplan1.png
http://s23.postimg.org/doo3aplhn/execplan2.png
General advice I can provide now until you provide more information.
Updated my query since it was pulling from views to pull straight from the tables.
INSERT INTO #ret
SELECT
SUM(LeistungArbeit)
FROM (
SELECT DISTINCT WebID FROM XBauAdmin.dbo.CustKunde
) Web
INNER JOIN dbo.TblLeistung ON TblLeistung.WebKundeID=web.webID
INNER JOIN dbo.MainRapport ON MainRapport.WebKundeID=web.webID
AND TblLeistung.LeistungRapportID=MainRapport.WebSourceID
AND MainRapport.RapportBaustelleID = #baustelleID
AND MainRapport.RapportDatum <= #datum
WHERE TblLeistung.LeistungArbeit is not null
AND TblLeistung.LeistungInventarID is null
AND TblLeistung.LeistungArbeit > 0
Get rid of the table variable. They have their use, but I switch to temp tables when I get over a 100 records; indexed temp tables simply perform better in my experience.
Update your select to the above query and retest performance
Check and ensure there are indexes on every column references in the query. If you use the show actual execution plan, SQL Server will help identify where indexes would be useful.

Paging in Pervasive SQL

How to do paging in Pervasive SQL (version 9.1)? I need to do something similar like:
//MySQL
SELECT foo FROM table LIMIT 10, 10
But I can't find a way to define offset.
Tested query in PSQL:
select top n *
from tablename
where id not in(
select top k id
from tablename
)
for all n = no.of records u need to fetch at a time.
and k = multiples of n(eg. n=5; k=0,5,10,15,....)
Our paging required that we be able to pass in the current page number and page size (along with some additional filter parameters) as variables. Since a select top #page_size doesn't work in MS SQL, we came up with creating an temporary or variable table to assign each rows primary key an identity that can later be filtered on for the desired page number and size.
** Note that if you have a GUID primary key or a compound key, you just have to change the object id on the temporary table to a uniqueidentifier or add the additional key columns to the table.
The down side to this is that it still has to insert all of the results into the temporary table, but at least it is only the keys. This works in MS SQL, but should be able to work for any DB with minimal tweaks.
declare #page_number int, #page_size
int -- add any additional search
parameters here
--create the temporary table with the identity column and the id
--of the record that you'll be selecting. This is an in memory
--table, so if the number of rows you'll be inserting is greater
--than 10,000, then you should use a temporary table in tempdb
--instead. To do this, use
--CREATE TABLE #temp_table (row_num int IDENTITY(1,1), objectid int)
--and change all the references to #temp_table to #temp_table
DECLARE #temp_table TABLE (row_num int
IDENTITY(1,1), objectid int)
--insert into the temporary table with the ids of the records
--we want to return. It's critical to make sure the order by
--reflects the order of the records to return so that the row_num
--values are set in the correct order and we are selecting the
--correct records based on the page INSERT INTO #temp_table
(objectid)
/* Example: Select that inserts
records into the temporary table
SELECT personid FROM person WITH
(NOLOCK) inner join degree WITH
(NOLOCK) on degree.personid =
person.personid WHERE
person.lastname = #last_name
ORDER BY person.lastname asc,
person.firsname asc
*/
--get the total number of rows that we matched DECLARE #total_rows
int SET #total_rows =
##ROWCOUNT
--calculate the total number of pages based on the number of
--rows that matched and the page size passed in as a parameter DECLARE
#total_pages int
--add the #page_size - 1 to the total number of rows to
--calculate the total number of pages. This is because sql
--alwasy rounds down for division of integers SET #total_pages =
(#total_rows + #page_size - 1) /
#page_size
--return the result set we are interested in by joining
--back to the #temp_table and filtering by row_num /* Example:
Selecting the data to return. If the
insert was done properly, then
you should always be joining the table
that contains the rows to return
to the objectid column on the
#temp_table
SELECT person.* FROM person WITH
(NOLOCK) INNER JOIN #temp_table
tt ON person.personid =
tt.objectid
*/
--return only the rows in the page that we are interested in
--and order by the row_num column of the #temp_table to make sure
--we are selecting the correct records WHERE tt.row_num <
(#page_size * #page_number) + 1
AND tt.row_num > (#page_size *
#page_number) - #page_size ORDER
BY tt.row_num
I face this problem in MS Sql too... no Limit or rownumber functions. What I do is insert the keys for my final query result (or sometimes the entire list of fields) into a temp table with an identity column... then I delete from the temp table everything outside the range I want... then use a join against the keys and the original table, to bring back the items I want. This works if you have a nice unique key - if you don't, well... that's a design problem in itself.
Alternative with slightly better performance is to skip the deleting step and just use the row numbers in your final join. Another performance improvement is to use the TOP operator so that at the very least, you don't have to grab the stuff past the end of what you want.
So... in pseudo-code... to grab items 80-89...
create table #keys (rownum int identity(1,1), key varchar(10))
insert #keys (key)
select TOP 89 key from myTable ORDER BY whatever
delete #keys where rownumber < 80
select <columns> from #keys join myTable on #keys.key = myTable.key
I ended up doing the paging in code. I just skip the first records in loop.
I thought I made up an easy way for doing the paging, but it seems that pervasive sql doesn't allow order clauses in subqueries. But this should work on other DBs (I tested it on firebird)
select *
from (select top [rows] * from
(select top [rows * pagenumber] * from mytable order by id)
order by id desc)
order by id