Select from a Temp table is giving slow performance - sql

I need some help with the below query where the last step of
Select * from #PersonDetail order by....
is taking so long to execute - why?
There are millions of records being inserted in this temp table #PersonDetail and insert process takes a few seconds, but the last Select from this same temp table is taking so long.
I created a unique clustered index on the columns used for order by and tried many other options but it doesn't make any difference in the performance.
It is a big stored procedure with many temp table but it is this last select step which is impacting the performance. Here is an example of the last step of the query:
DROP TABLE IF EXISTS #PersonDetail
CREATE TABLE #PersonDetail
(
PersonId INT NOT NULL,
Name NVARCHAR(50) NULL,
Number INT NOT NULL,
Tag NVARCHAR(50) NULL,
UserId INT NOT NULL,
NumberEncrypted VARCHAR(100),
Type NVARCHAR(255),
Status NVARCHAR(50),
CreatedDate DATETIMEOFFSET(7),
AddressDetailId NVARCHAR(50),
Category NVARCHAR(50),
PrimaryId INT,
DailyAmount MONEY,
UNIQUE (PersonId UserId),
UNIQUE CLUSTERED(CreatedDate, UserId)
)
INSERT INTO #PersonDetail (PersonId, Name, Number, Tag, UserId, NumberEncrypted,
Type, Status, CreatedDate, AddressDetailId, Category, PrimaryId, Amount)
SELECT
PersonId, Name, Number, Tag, UserId, NumberEncrypted,
Type, Status, CreatedDate, AddressDetailId, Category, PrimaryId, DailyAmount
FROM
#User u
JOIN
dbo.DailyAmount da (NOLOCK) ON da.UserId = u.UserId
SELECT *
FROM #PersonDetail pd
ORDER BY CreatedDate, UserId

You must specify what database are you using.
In general, you must do theese things:
create some indexes on the join columns (DailyAmount.userId, User.userID); how to create the index must change;
create an index on the order by columns, (CreatedDate+UserID); this must change, in postgresql for example an index with the 2 column is better than 2 indexes;
If your data are not changing frequently, you could try materialized view and create the indexes on the materialized view.

Related

How to sum/group/sort columns in a view

I have two tables, Author_Dim and Author_Fact as below:
CREATE TABLE Author_Dim
(
TitleAuthor_ID INT IDENTITY(1,1),
Title_ID CHAR(20),
Title VARCHAR(80),
Type_Title CHAR(12),
Author_ID CHAR(20),
Last_Name VARCHAR(40),
First_Name VARCHAR(20),
Contract_Author BIT,
Author_Order INT,
PRIMARY KEY (TitleAuthor_ID),
);
GO
CREATE TABLE Author_Fact
(
Fact_ID INT IDENTITY(1,1),
TitleAuthor_ID INT,
Author_ID CHAR (20),
Price DEC(10,2),
YTD_Sales INT,
Advance DEC(10,2),
Royalty INT,
Royalty_Perc INT,
Total_Sales DEC(10,2),
Total_Advance DEC(10,2),
Total_Royalty DEC(10,2)
PRIMARY KEY(Fact_ID),
FOREIGN KEY (TitleAuthor_ID) REFERENCES Author_Dim(TitleAuthor_ID),
);
go
I wish to create a view that gives the total royalties paid per author and then sorts it with the highest paid author shown first, i.e. it sums the Total_Royalty column, groups it by the Author_ID and then sorts the Total_Royalty in descending order.
I have the below but I'm not sure how to add the sum/group/sort functions to the view:
Create view [Total_Royalty_View] As (
Select Author_Dim.Author_ID, Author_Dim.Last_Name, Author_Dim.First_Name, Author_Fact.Total_Royalty
From Author_Dim
Join Author_Fact
On Author_Fact.TitleAuthor_ID = Author_Dim.TitleAuthor_ID
);
Go
In SQL it is all tables. You select from tables and the result is again a table (consisting of columns and rows). You can store a select statement for re-use and this is called a view. You could just as well write an ad-hoc view (a subquery in a from clause). Their results are again tables.
And tables are considered unordered sets of data.
So, you cannot write a view that produces an ordered set of rows.
Here is the view (unordered):
create view total_royalty_view as
select
a.author_id,
a.last_name,
a.first_name,
coalesce(r.sum_total_royalty, 0) as total_royalty
from author_dim a
left join
(
select titleauthor_id, sum(total_royalty) as sum_total_royalty
from author_fact
group by titleauthor_id
) r on r.titleauthor_id = a.titleauthor_id;
And here is how to select from it:
select *
from total_royalty_view
order by total_royalty desc;

Speed up performance on UPDATE of temp table

I have a SQL Server 2012 stored procedure. I'm filling a temp table below, and that's fairly straightforward. However, after that I'm doing some UPDATE on it.
Here's my T-SQL for declaring the temp table, #SourceTable, filling it, then doing some updates on it. After all of this, I simply take this temp table and insert it into a new table we are filling with a MERGE statement which joins on DOI. DOI is a main column here, and you'll see below that my UPDATE statements get MAX/MIN on several columns based on this column as the table can have multiple rows with the same DOI.
My question is...how can I speed up filling #SourceTable or doing my updates on it? Are there any indexes I can create? I'm decent at SQL, but not the best at performance issues. I'm dealing with maybe 60,000,000 records here in the temp table. It's been running for almost 4 hours now. This is a one-time deal here for a script I'm running once.
CREATE TABLE #SourceTable
(
DOI VARCHAR(72),
FullName NVARCHAR(128), LastName NVARCHAR(64),
FirstName NVARCHAR(64), FirstInitial NVARCHAR(10),
JournalId INT, JournalVolume VARCHAR(16),
JournalIssue VARCHAR(16), JournalFirstPage VARCHAR(16),
JournalLastPage VARCHAR(16), ArticleTitle NVARCHAR(1024),
PubYear SMALLINT, CreatedDate SMALLDATETIME,
UpdatedDate SMALLDATETIME,
ISSN_e VARCHAR(16), ISSN_p VARCHAR(16),
Citations INT, LastCitationRefresh SMALLDATETIME,
LastCitationRefreshValue SMALLINT, IsInSearch BIT,
BatchUpdatedDate SMALLDATETIME, LastIndexUpdate SMALLDATETIME,
ArticleClassificationId INT, ArticleClassificationUpdatedBy INT,
ArticleClassificationUpdatedDate SMALLDATETIME,
Affiliations VARCHAR(8000),
--Calculated columns for use in importing...
RowNum SMALLINT, MinCreatedDatePerDOI SMALLDATETIME,
MaxUpdatedDatePerDOI SMALLDATETIME,
MaxBatchUpdatedDatePerDOI SMALLDATETIME,
MaxArticleClassificationUpdatedByPerDOI INT,
MaxArticleClassificationUpdatedDatePerDOI SMALLDATETIME,
AffiliationsSameForAllDOI BIT, NewArticleId INT
)
--***************************************
--CROSSREF_ARTICLES
--***************************************
--GET RAW DATA INTO SOURCE TABLE TEMP TABLE..
INSERT INTO #SourceTable
SELECT
DOI, FullName, LastName, FirstName, FirstInitial,
JournalId, LEFT(JournalVolume,16) AS JournalVolume,
LEFT(JournalIssue,16) AS JournalIssue,
LEFT(JournalFirstPage,16) AS JournalFirstPage,
LEFT(JournalLastPage,16) AS JournalLastPage,
ArticleTitle, PubYear, CreatedDate, UpdatedDate,
ISSN_e, ISSN_p,
ISNULL(Citations,0) AS Citations, LastCitationRefresh,
LastCitationRefreshValue, IsInSearch, BatchUpdatedDate,
LastIndexUpdate, ArticleClassificationId,
ArticleClassificationUpdatedBy,
ArticleClassificationUpdatedDate, Affiliations,
ROW_NUMBER() OVER(PARTITION BY DOI ORDER BY UpdatedDate DESC, CreatedDate ASC) AS RowNum,
NULL AS MinCreatedDatePerDOI, NULL AS MaxUpdatedDatePerDOI,
NULL AS MaxBatchUpdatedDatePerDOI,
NULL AS MaxArticleClassificationUpdatedByPerDOI,
NULL AS ArticleClassificationUpdatedDatePerDOI,
0 AS AffiliationsSameForAllDOI, NULL AS NewArticleId
FROM
CrossRef_Articles WITH (NOLOCK)
--UPDATE SOURCETABLE WITH MAX/MIN/CALCULATED VALUES PER DOI...
UPDATE S
SET MaxUpdatedDatePerDOI = T.MaxUpdatedDatePerDOI, MaxBatchUpdatedDatePerDOI = T.MaxBatchUpdatedDatePerDOI, MinCreatedDatePerDOI = T.MinCreatedDatePerDOI, MaxArticleClassificationUpdatedByPerDOI = T.MaxArticleClassificationUpdatedByPerDOI, MaxArticleClassificationUpdatedDatePerDOI = T.MaxArticleClassificationUpdatedDatePerDOI
FROM #SourceTable S
INNER JOIN (SELECT MAX(UpdatedDate) AS MaxUpdatedDatePerDOI, MIN(CreatedDate) AS MinCreatedDatePerDOI, MAX(BatchUpdatedDate) AS MaxBatchUpdatedDatePerDOI, MAX(ArticleClassificationUpdatedBy) AS MaxArticleClassificationUpdatedByPerDOI, MAX(ArticleClassificationUpdatedDate) AS MaxArticleClassificationUpdatedDatePerDOI, DOI from #SourceTable GROUP BY DOI) AS T ON S.DOI = T.DOI
UPDATE S
SET AffiliationsSameForAllDOI = 1
FROM #SourceTable S
WHERE NOT EXISTS (SELECT 1 FROM #SourceTable S2 WHERE S2.DOI = S.DOI AND S2.Affiliations <> S.Affiliations)
After
This will probably be a faster way to do the update-- hard to say without seeing the execution plan, but it might be running the GROUP BY for every row.
with doigrouped AS
(
SELECT
MAX(UpdatedDate) AS MaxUpdatedDatePerDOI,
MIN(CreatedDate) AS MinCreatedDatePerDOI,
MAX(BatchUpdatedDate) AS MaxBatchUpdatedDatePerDOI,
MAX(ArticleClassificationUpdatedBy) AS MaxArticleClassificationUpdatedByPerDOI,
MAX(ArticleClassificationUpdatedDate) AS MaxArticleClassificationUpdatedDatePerDOI,
DOI
FROM #SourceTable
GROUP BY DOI
)
UPDATE S
SET MaxUpdatedDatePerDOI = T.MaxUpdatedDatePerDOI,
MaxBatchUpdatedDatePerDOI = T.MaxBatchUpdatedDatePerDOI,
MinCreatedDatePerDOI = T.MinCreatedDatePerDOI,
MaxArticleClassificationUpdatedByPerDOI = T.MaxArticleClassificationUpdatedByPerDOI,
MaxArticleClassificationUpdatedDatePerDOI = T.MaxArticleClassificationUpdatedDatePerDOI
FROM #SourceTable S
INNER JOIN doigrouped T ON S.DOI = T.DOI
If it is faster it will be a couple of orders of magnitude faster -- but that does not mean your machine will be able to process 60 million records in any period of time... if you didn't test on 100k first there is no way to know how long it will take to finish.
I suppose you can try:
Replace INSERT with SELECT INTO
Anyway you don't have indexes on your #SourceTable.
SELECT INTO is minimally logged, so you must have some speedup here
Replace UPDATE with SELECT INTO another table
Instead of updating #SourceTable you can create #SourceTable_Updates with SELECT INTO (modified Hogan query):
with doigrouped AS
(
SELECT
MAX(UpdatedDate) AS MaxUpdatedDatePerDOI,
MIN(CreatedDate) AS MinCreatedDatePerDOI,
MAX(BatchUpdatedDate) AS MaxBatchUpdatedDatePerDOI,
MAX(ArticleClassificationUpdatedBy) AS MaxArticleClassificationUpdatedByPerDOI,
MAX(ArticleClassificationUpdatedDate) AS MaxArticleClassificationUpdatedDatePerDOI,
DOI
FROM #SourceTable
GROUP BY DOI
)
SELECT
S.DOI,
MaxUpdatedDatePerDOI = T.MaxUpdatedDatePerDOI,
MaxBatchUpdatedDatePerDOI = T.MaxBatchUpdatedDatePerDOI,
MinCreatedDatePerDOI = T.MinCreatedDatePerDOI,
MaxArticleClassificationUpdatedByPerDOI = T.MaxArticleClassificationUpdatedByPerDOI,
MaxArticleClassificationUpdatedDatePerDOI = T.MaxArticleClassificationUpdatedDatePerDOI
INTO #SourceTable_Updates
FROM #SourceTable S
INNER JOIN doigrouped T ON S.DOI = T.DOI
Use JOIN-ed #SourceTable and #SourceTable_Updates
Hope this helps
Here are a couple of things that may help the performance of you insert statement
Does the CrossRef_Articles table have a primary key? If it does insert the primary key (be sure it is indexed) into your temp table and only include the fields you need to do your calculations. Once the calculations are done then do a select and join your temp table to the original table on the Id field. It takes time to write all that data to disk.
Look at your tempdb. If you have run this query multiple times then the database or log file size may be out of control.
Check the fields between the 2 original tables joined to see if the fields are indexed?

Does a temp variable maintain the order of rows?

DECLARE #temp_table TABLE (order_no int, username nvarchar(100))
INSERT INTO #temp_table(order_no, username)
SELECT TOP 10
user_id, username
FROM users
ORDER BY user_id
SELECT * FROM #temp_table
Will the rows in #temp_table always be ordered?
The order is never guaranteed unless you explicitly use ORDER BY.
The reason that we can't guarantee the order of a variable table is because there is no default primary key, so if you add an auto incremented key which will serve as the default key then it will always follow the order of this default key. In other words, it will follow the order of insertion.
I.E.
DECLARE #temp_table TABLE (tempid int IDENTITY(1,1), order_no int, username nvarchar(100))

Delete duplicate rows with soundex?

I have two tables, one has foreign keys to the other. I want to delete duplicates from Table 1 at the same time updating the keys on Table 2. I.e count the duplicates on Table 1 keep 1 key from the duplicates and query the rest of the duplicate records on Table 2 replacing them with the key I'm keeping from Table 1. Soundex would be the best option because not all the names are spelled right in Table 1. I have the basic algorithm but not sure how to do it. Help?
So far this is what I have:
declare #Duplicate int
declare #OriginalKey int
create table #tempTable1
(
CourseID int, <--- The Key I want to keep or delete
SchoolID int,
CourseName nvarchar(100),
Category nvarchar(100),
IsReqThisYear bit,
yearrequired int
);
create table #tempTable2
(
CertID int,
UserID int,
CourseID int, <---- Must stay updated with Table 1
SchoolID int,
StartDateOfCourse datetime,
EndDateOfCourse datetime,
Type nvarchar(100),
HrsOfClass float,
Category nvarchar(100),
Cost money,
PassFail varchar(20),
Comments nvarchar(1024),
ExpiryDate datetime,
Instructor nvarchar(200),
Level nchar(10)
)
--Deletes records from Table 1 not used in Table 2--
delete from Table1
where CourseID not in (select CourseID from Table2 where CourseID is not null)
insert into #tempTable1(CourseID, SchoolID, CourseName, Category, IsReqThisYear, yearrequired)
select CourseID, SchoolID, CourseName, Category, IsReqThisYear, yearrequired from Table1
insert into #tempTable2(CertID, UserID, CourseID, SchoolID, StartDateOfCourse, EndDateOfCourse, Type, HrsOfClass,Category, Cost, PassFail, Comments, ExpiryDate, Instructor, Level)
select CertID, UserID, CourseID, SchoolID, StartDateOfCourse, EndDateOfCourse, Type, HrsOfClass,Category, Cost, PassFail, Comments, ExpiryDate, Instructor, Level from Table2
select cour.CourseName, Count(cour.CourseName) cnt from Table1 as cour
join #tempTable1 as temp on cour.CourseID = temp.CourseID
where SOUNDEX(temp.CourseName) = SOUNDEX(cour.CourseName) <---
The last part does not exactly work, gives me an error
Error: Column 'Table1.CourseName' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
UPDATE: Some of the names in CourseName have numbers in them too. Like some are in romans and numeral format. Need to find those too but Soundex ignores numbers.

versioning of a table

anybody has seen any examples of a table with multiple versions for each record
something like if you would had the table
Person(Id, FirstName, LastName)
and you change a record's LastName than you would have both versions of LastName (first one, and the one after the change)
I've seen this done two ways. The first is in the table itself by adding an EffectiveDate and CancelDate (or somesuch). To get the current for a given record, you'd do something like: SELECT Id, FirstName, LastName FROM Table WHERE CancelDate IS NULL
The other is to have a global history table (which holds all of your historical data). The structure for such a table normally looks something like
Id bigint not null,
TableName nvarchar(50),
ColumnName nvarchar(50),
PKColumnName nvarchar(50),
PKValue bigint, //or whatever datatype
OriginalValue nvarchar(max),
NewValue nvarchar(max),
ChangeDate datetime
Then you set a trigger on your tables (or, alternatively, add a policy that all of your Updates/Inserts will also insert into your HX table) so that the correct data is logged.
The way we're doing it (might not be the best way) is to have an active bit field, and a foreign key back to the parent record. So for general queries you would filter on active employees, but you can get the history of a single employee with their Employee ID.
declare #employees
(
PK_emID int identity(1,1),
EmployeeID int,
FirstName varchar(50),
LastName varchar(50),
Active bit,
FK_EmployeeID int
primary key(PK_emID)
)
insert into #employees
(
EmployeeID,
FirstName,
LastName,
Active,
FK_EployeeID
)
select 1, 'David', 'Engle', 1,null
union all
select 2, 'Amy', 'Edge', 0,null
union all
select 2, 'Amy','Engle',1,2