SQL query Optimization help - sql

I have the the following SQL query
Declare #tempcalctbl Table
(
ItemId varchar(50),
ItemLocation varchar(50),
ItemNo varchar(50),
Width real,
Unit varchar(50),
date datetime
)
Insert Into #tempcalctbl
Select distinct SubId,ItemLocation,ItemNo,
(ABS((Select width From #temptbl a Where ItemProcess ='P1'and a.ItemId = c.ItemId
and a.ItemNo = c.ItemNo and a.ItemLocation = c.ItemLocation)
-(Select width From #temptbl b Where ItemProcess ='P2' and b.ItemId = c.ItemId
and b.ItemNo = c.ItemNo and b.ItemLocation = c.ItemLocation))) * 1000,
Unit,date
From #temptbl c
Group by ItemId,ItemLocation,ItemNo,Unit,date
I was wondering how to optimize this query.
The idea is to find out the different in width (p1's item - p2's item) between ItemProcess 'P1' and 'P2' according to the same ItemID, same ItemNo and same ItemLocation.
I have around 75000 and it took more then 25 minute to get the width differences for all the ItemId.
I tried to use Group by for the width different calculation but it would return multiple row instead of just a value which then would return error. By the way I am use MS SQL server 2008 and #tempcalctbl is a table that I declared in a store procedure.

Does the following help?
INSERT INTO #tempcalctbl
SELECT P1.SubId ,
P1.ItemLocation ,
P1.ItemNo ,
ABS(P1.Width - P2.Width) * 1000 AS Width ,
P1.Unit ,
P1.date
FROM #temptbl AS P1
INNER JOIN #temptbl AS P2 ON P1.ItemId = P2.ItemId
AND P1.ItemNo = P2.ItemNo
AND P1.ItemLocation = P2.ItemLocation
WHERE P1.ItemProcess = 'P1'
AND P2.ItemProcess = 'P2'
EDIT
To make use of indexes, you will need to change your table variable to a temporary table
CREATE TABLE #temptbl
(
ItemId varchar(50),
ItemLocation varchar(50),
ItemNo varchar(50),
Width real,
Unit varchar(50),
date DATETIME,
ItemProcess INT,
SubId INT
)
CREATE NONCLUSTERED INDEX Index01 ON #temptbl
(
ItemProcess ASC,
ItemId ASC,
ItemLocation ASC,
ItemNo ASC
)
INCLUDE ( SubId,Width,Unit,date)
GO
That should speed you up a little.

John Petrak's answer is the best query for this case.
If the speed is still now acceptable, maybe you can store #temptbl at a temporary real table, and create the related index on those four columns.

Related

Updating 20 rows in a table is really slow

I can't figure out why updating only 21 rows in a table takes so much time.
Step 1: I'm creating #tempTable from the StagingTable (it will never have more than 20 rows of data)
CREATE TABLE #tmpTable (
ID INT NULL,
UniqueID INT NULL,
ReportDate VARCHAR(15) NULL,
DOB Datetime NULL,
Weight VARCHAR(15) NULL,
Height VARCHAR(15) NULL)
INSERT INTO #tempTable (
ID,
UniqueID,
ReportDate,
DOB,
Weight,
Height)
SELECT
A.ID,
A.UniqueID,
A.ReportDate,
A.DOB,
A.Weight,
A.Height
FROM [testDB].[StagingTable] as A
WHERE A.UniqueID = '12345'
Step 2. Updating FinalTable:
UPDATE [Customers].[FinalTable]
SET ID = B.ID,
UniqueID = B.UniqueID,
ReportDate = B.ReportDate,
DOB = B.DOB,
Weight = B.Weight,
Height = B.Height
FROM #tempTable AS B
WHERE [Customers].[FinalTable].[ReportDate] = B.ReportDate
AND [Customers].[FinalTable].[DOB] = B.DOB
This query takes more than 30 minutes!
Is there any way to speed up this update process? Any ideas what I might be doing wrong?
I just want to add that the FinalTable has millions of rows...
Any help would be greatly appreciated.
Thanks!
If there are only 30 matches, then you want an index on #temptable(ReportDate, DOB):
create index idx_temptable_2 on #temptable(ReportDate, DOB);

Speed up performance on UPDATE of temp table

I have a SQL Server 2012 stored procedure. I'm filling a temp table below, and that's fairly straightforward. However, after that I'm doing some UPDATE on it.
Here's my T-SQL for declaring the temp table, #SourceTable, filling it, then doing some updates on it. After all of this, I simply take this temp table and insert it into a new table we are filling with a MERGE statement which joins on DOI. DOI is a main column here, and you'll see below that my UPDATE statements get MAX/MIN on several columns based on this column as the table can have multiple rows with the same DOI.
My question is...how can I speed up filling #SourceTable or doing my updates on it? Are there any indexes I can create? I'm decent at SQL, but not the best at performance issues. I'm dealing with maybe 60,000,000 records here in the temp table. It's been running for almost 4 hours now. This is a one-time deal here for a script I'm running once.
CREATE TABLE #SourceTable
(
DOI VARCHAR(72),
FullName NVARCHAR(128), LastName NVARCHAR(64),
FirstName NVARCHAR(64), FirstInitial NVARCHAR(10),
JournalId INT, JournalVolume VARCHAR(16),
JournalIssue VARCHAR(16), JournalFirstPage VARCHAR(16),
JournalLastPage VARCHAR(16), ArticleTitle NVARCHAR(1024),
PubYear SMALLINT, CreatedDate SMALLDATETIME,
UpdatedDate SMALLDATETIME,
ISSN_e VARCHAR(16), ISSN_p VARCHAR(16),
Citations INT, LastCitationRefresh SMALLDATETIME,
LastCitationRefreshValue SMALLINT, IsInSearch BIT,
BatchUpdatedDate SMALLDATETIME, LastIndexUpdate SMALLDATETIME,
ArticleClassificationId INT, ArticleClassificationUpdatedBy INT,
ArticleClassificationUpdatedDate SMALLDATETIME,
Affiliations VARCHAR(8000),
--Calculated columns for use in importing...
RowNum SMALLINT, MinCreatedDatePerDOI SMALLDATETIME,
MaxUpdatedDatePerDOI SMALLDATETIME,
MaxBatchUpdatedDatePerDOI SMALLDATETIME,
MaxArticleClassificationUpdatedByPerDOI INT,
MaxArticleClassificationUpdatedDatePerDOI SMALLDATETIME,
AffiliationsSameForAllDOI BIT, NewArticleId INT
)
--***************************************
--CROSSREF_ARTICLES
--***************************************
--GET RAW DATA INTO SOURCE TABLE TEMP TABLE..
INSERT INTO #SourceTable
SELECT
DOI, FullName, LastName, FirstName, FirstInitial,
JournalId, LEFT(JournalVolume,16) AS JournalVolume,
LEFT(JournalIssue,16) AS JournalIssue,
LEFT(JournalFirstPage,16) AS JournalFirstPage,
LEFT(JournalLastPage,16) AS JournalLastPage,
ArticleTitle, PubYear, CreatedDate, UpdatedDate,
ISSN_e, ISSN_p,
ISNULL(Citations,0) AS Citations, LastCitationRefresh,
LastCitationRefreshValue, IsInSearch, BatchUpdatedDate,
LastIndexUpdate, ArticleClassificationId,
ArticleClassificationUpdatedBy,
ArticleClassificationUpdatedDate, Affiliations,
ROW_NUMBER() OVER(PARTITION BY DOI ORDER BY UpdatedDate DESC, CreatedDate ASC) AS RowNum,
NULL AS MinCreatedDatePerDOI, NULL AS MaxUpdatedDatePerDOI,
NULL AS MaxBatchUpdatedDatePerDOI,
NULL AS MaxArticleClassificationUpdatedByPerDOI,
NULL AS ArticleClassificationUpdatedDatePerDOI,
0 AS AffiliationsSameForAllDOI, NULL AS NewArticleId
FROM
CrossRef_Articles WITH (NOLOCK)
--UPDATE SOURCETABLE WITH MAX/MIN/CALCULATED VALUES PER DOI...
UPDATE S
SET MaxUpdatedDatePerDOI = T.MaxUpdatedDatePerDOI, MaxBatchUpdatedDatePerDOI = T.MaxBatchUpdatedDatePerDOI, MinCreatedDatePerDOI = T.MinCreatedDatePerDOI, MaxArticleClassificationUpdatedByPerDOI = T.MaxArticleClassificationUpdatedByPerDOI, MaxArticleClassificationUpdatedDatePerDOI = T.MaxArticleClassificationUpdatedDatePerDOI
FROM #SourceTable S
INNER JOIN (SELECT MAX(UpdatedDate) AS MaxUpdatedDatePerDOI, MIN(CreatedDate) AS MinCreatedDatePerDOI, MAX(BatchUpdatedDate) AS MaxBatchUpdatedDatePerDOI, MAX(ArticleClassificationUpdatedBy) AS MaxArticleClassificationUpdatedByPerDOI, MAX(ArticleClassificationUpdatedDate) AS MaxArticleClassificationUpdatedDatePerDOI, DOI from #SourceTable GROUP BY DOI) AS T ON S.DOI = T.DOI
UPDATE S
SET AffiliationsSameForAllDOI = 1
FROM #SourceTable S
WHERE NOT EXISTS (SELECT 1 FROM #SourceTable S2 WHERE S2.DOI = S.DOI AND S2.Affiliations <> S.Affiliations)
After
This will probably be a faster way to do the update-- hard to say without seeing the execution plan, but it might be running the GROUP BY for every row.
with doigrouped AS
(
SELECT
MAX(UpdatedDate) AS MaxUpdatedDatePerDOI,
MIN(CreatedDate) AS MinCreatedDatePerDOI,
MAX(BatchUpdatedDate) AS MaxBatchUpdatedDatePerDOI,
MAX(ArticleClassificationUpdatedBy) AS MaxArticleClassificationUpdatedByPerDOI,
MAX(ArticleClassificationUpdatedDate) AS MaxArticleClassificationUpdatedDatePerDOI,
DOI
FROM #SourceTable
GROUP BY DOI
)
UPDATE S
SET MaxUpdatedDatePerDOI = T.MaxUpdatedDatePerDOI,
MaxBatchUpdatedDatePerDOI = T.MaxBatchUpdatedDatePerDOI,
MinCreatedDatePerDOI = T.MinCreatedDatePerDOI,
MaxArticleClassificationUpdatedByPerDOI = T.MaxArticleClassificationUpdatedByPerDOI,
MaxArticleClassificationUpdatedDatePerDOI = T.MaxArticleClassificationUpdatedDatePerDOI
FROM #SourceTable S
INNER JOIN doigrouped T ON S.DOI = T.DOI
If it is faster it will be a couple of orders of magnitude faster -- but that does not mean your machine will be able to process 60 million records in any period of time... if you didn't test on 100k first there is no way to know how long it will take to finish.
I suppose you can try:
Replace INSERT with SELECT INTO
Anyway you don't have indexes on your #SourceTable.
SELECT INTO is minimally logged, so you must have some speedup here
Replace UPDATE with SELECT INTO another table
Instead of updating #SourceTable you can create #SourceTable_Updates with SELECT INTO (modified Hogan query):
with doigrouped AS
(
SELECT
MAX(UpdatedDate) AS MaxUpdatedDatePerDOI,
MIN(CreatedDate) AS MinCreatedDatePerDOI,
MAX(BatchUpdatedDate) AS MaxBatchUpdatedDatePerDOI,
MAX(ArticleClassificationUpdatedBy) AS MaxArticleClassificationUpdatedByPerDOI,
MAX(ArticleClassificationUpdatedDate) AS MaxArticleClassificationUpdatedDatePerDOI,
DOI
FROM #SourceTable
GROUP BY DOI
)
SELECT
S.DOI,
MaxUpdatedDatePerDOI = T.MaxUpdatedDatePerDOI,
MaxBatchUpdatedDatePerDOI = T.MaxBatchUpdatedDatePerDOI,
MinCreatedDatePerDOI = T.MinCreatedDatePerDOI,
MaxArticleClassificationUpdatedByPerDOI = T.MaxArticleClassificationUpdatedByPerDOI,
MaxArticleClassificationUpdatedDatePerDOI = T.MaxArticleClassificationUpdatedDatePerDOI
INTO #SourceTable_Updates
FROM #SourceTable S
INNER JOIN doigrouped T ON S.DOI = T.DOI
Use JOIN-ed #SourceTable and #SourceTable_Updates
Hope this helps
Here are a couple of things that may help the performance of you insert statement
Does the CrossRef_Articles table have a primary key? If it does insert the primary key (be sure it is indexed) into your temp table and only include the fields you need to do your calculations. Once the calculations are done then do a select and join your temp table to the original table on the Id field. It takes time to write all that data to disk.
Look at your tempdb. If you have run this query multiple times then the database or log file size may be out of control.
Check the fields between the 2 original tables joined to see if the fields are indexed?

Fetch single data from table in sql?

I have three tables in my SQL Server
1.) Registration it's columns
Reg_Id bigint, primary key and auto increment
Name nvarchar(50),
Tranx_id nvarchar(30),
Email nvarchar(30),
Username nvarchar(30),
Password nvarchar(30),
Edition_Id nvarchar(50),
Default_Id nvarchar(50),
Reg_Date datetime,
usertype nvarchar(50),
2.) AllEditionPages it's columns
Page_id bigint, primary key and auto increment
edition_date datetime,
noofpages int,
Page_no int,
image_path nvarchar(50),
Active int,
type_of_page varchar(50),
Image_Width int,
Image_Height int,
3.) Edition
id int, primary key and auto increment
edition_date datetime,
noofpages int,
XMLFile nvarchar(50),
PDFFile nvarchar(50),
PDFPrefix nvarchar(50),
type nvarchar(50),
price nvarchar(50),
reg_req nvarchar(50)
As per above table I try below sql query
SELECT edi.*, aep.*, reg.*
FROM Edition as edi INNER JOIN Registration as reg
ON edi.id = reg.Edition_Id INNER JOIN AllEditionPages as aep
ON edi.edition_date = aep.edition_date
where reg.Edition_Id= edi.id
and reg.Reg_ID = 14
From this query I get this output:
id edition_date type price page-id
96 2012-07-18 00:00:00.000 free null 2503
96 2012-07-18 00:00:00.000 free null 2503
I get in output two rows, but want only single row in output
You can use common table expression in this scenario in following way:
WITH cteTable AS
(
SELECT edi.*, aep.*, reg.*
FROM Edition as edi INNER JOIN Registration as reg
ON edi.id = reg.Edition_Id INNER JOIN AllEditionPages as aep
ON edi.edition_date = aep.edition_date
where reg.Edition_Id= edi.id
and reg.Reg_ID = 14
)
select top 1 * from cteTable
As you two table contains edition_date, you need to spcify the column name instead of using "*" in following way:
WITH cteTable AS
(
SELECT edi.id,edi.edition_date,edi.noofpagez,edi.XMLFile,edi.PDFFile ,edi.PDFPrefix,edi.type,edi.price,edi.reg_req ,
aep.Page_id, aep.edition_date,aep.noofpages,aep.Page_no,aep.image_path,aep.Active,aep.type_of_page,aep.Image_Width,aep.Image_Height,
reg.Reg_Id,reg.Name,reg.Tranx_id,reg.Email,reg.Username,reg.Password,reg.Edition_Id,reg.Default_Id,reg.Reg_Date,reg.usertype
FROM Edition as edi INNER JOIN Registration as reg
ON edi.id = reg.Edition_Id INNER JOIN AllEditionPages as aep
ON edi.edition_date = aep.edition_date
where reg.Edition_Id= edi.id
and reg.Reg_ID = 14
)
select top 1 * from cteTable
You have to specify one row.From your putput it looks like you have duplicate data and if that is the issue you should use distinct to get distinct rows..Otherwise specify what exactly you need.
You can use select distinct if the 2 rows have the same value, or you can simple use LIMIT 0,1 behind your query.

SQL Server - multiple rows into one

Hope you can help...
I have data table in this format (Lets refer this table as 'Product')
productid property_name property_value last_updated
p0001 type p1 05-Oct-2010
p0001 name Premium 05-Oct-2010
p0001 cost 172.00 05-Oct-2010
p0002 type p3 06-Oct-2010
p0002 name standard 06-Oct-2010
p0002 cost 13.00 06-Oct-2010
*(there are like 50 more properties of which i would need 15 atleast in my query.
However, i just ignore them for this example)*
I would need the data in this format:
productid type name cost
p0001 p1 Premium 172.00
p0002 p3 standard 13.00
I tried with a function and a view to get this format but it takes good few mins to get some 1000 records. Wonder if anyone knows quicker way?
What I tried:
Create function fun1(#productid nvarchar(50)) returns #retdetails table
(
type nvarchar(50) null,
name nvarchar(50) null,
cost nvarchar(50) null,
)
begin
declare
#type nvarchar(50),
#name nvarchar(50),
#cost nvarchar(50),
select #type=property_value from product where productid=#productid and property_name='type';
select #name=property_value from product where productid=#productid and property_name='name';
select #cost=property_value from product where productid=#productid and property_name='cost';
if isnull(#productid,'')<>''
begin
insert #retdetails
select #type, #name, #cost;
end;
return;
end;
then a view
select p.productid, pd.type, pd.name, pd.cost
from (select distinct productid from product) p
cross apply dbo.fun1(p.productid) pd
The slower response might be down to 'distinct' but without that I get duplicate records. I would appreciate any suggestion to get a quickier sql response.
Many Thanks
You could try this PIVOT approach
SELECT productid,
MAX(CASE WHEN property_name = 'type' THEN property_value END) AS type,
MAX(CASE WHEN property_name = 'name' THEN property_value END) AS name,
MAX(CASE WHEN property_name = 'cost' THEN property_value END) AS cost
FROM Product
GROUP BY productid
You could probably do this by joining the table back onto itself a couple of times, might be a bit of a performance issue though.
Starting from SQL Server 2005, you can use PIVOT operator:
DECLARE #TestData TABLE
(
productid VARCHAR(5) NOT NULL
,property_name VARCHAR(5) NOT NULL
,property_value VARCHAR(10)NOT NULL
,last_updated DATETIME NOT NULL
);
INSERT #TestData
SELECT 'p0001','type','p1' ,'05-Oct-2010'
UNION ALL
SELECT 'p0001','name','Premium' ,'05-Oct-2010'
UNION ALL
SELECT 'p0001','cost','172.00' ,'05-Oct-2010'
UNION ALL
SELECT 'p0002','type','p3' ,'06-Oct-2010'
UNION ALL
SELECT 'p0002','name','standard','06-Oct-2010'
UNION ALL
SELECT 'p0002','cost','13.00' ,'06-Oct-2010';
;WITH PivotSource
AS
(
SELECT a.productid
,a.property_name
,a.property_value
FROM #TestData a
)
SELECT pvt.*
--,CONVERT(NUMERIC(8,2), pvt.cost) NumericCost
FROM PivotSource src
PIVOT ( MAX(src.property_value) FOR src.property_name IN ([type], [name], [cost]) ) pvt
Results:
productid type name cost NumericCost
p0001 p1 Premium 172.00 172.00
p0002 p3 standard13.00 13.00

group by clause

i have a table called table1 and it has following columns.
suppose there are records like localamount is 20,000, 30000,50000, 100000 then as per my condition i have to delete records from this table according to the group by set of site id, till id, transid,shift id where localamount exceeds 10,000... the rest of the records can be available?
my aim is to delete rows from this table where local amount is exceeds 10,0000 according to site id, till id, transid,shift id
SiteId varchar(10),
TillId tinyint,
ShiftId int,
TransId int,
TranDate datetime,
SettlementType varchar(5),
CreditCardNumber varchar(25),
ProductTypeCode varchar(10),
NewProductTypeCode varchar(10),
TransactionType int,
ForeignAmount money,
LocalAmount money,
ProductCode varchar(10)
Im not sure I understand what you are saying, but couldn't you do this without a group by?
delete from table1 where LocalAmount > 10,0000[sic] and SiteId = whatever and TillId = whatever...
obviously take the [sic] out...
Assuming you want to delete the whole group where the sum is > 10000
;with cte as
(
select sum(localamount) over
(partition by siteid, tillid, transid,shiftid) as l,
* from table1
)
delete from cte where l>10000