SQL Query Optimization (After table structure change) - sql

I am just wondering if anyone can see a better solution to this issue.
I previously had a flat (wide) table to work with, that contained multiple columns. This table has now been changed to a dynamic table containing just 2 columns (statistic_name and value).
I have amended my code to use sub queries to return the same results as before, however I am worried the performance is going to be terrible when using real live data. This is based on the exacution plan which shows a considerable difference between the 2 versions.
See below for a very simplified example of my issue -
CREATE TABLE dbo.TEST_FLAT
(
ID INT,
TEST1 INT,
TEST2 INT,
TEST3 INT,
TEST4 INT,
TEST5 INT,
TEST6 INT,
TEST7 INT,
TEST8 INT,
TEST9 INT,
TEST10 INT,
TEST11 INT,
TEST12 INT
)
CREATE TABLE dbo.TEST_DYNAMIC
(
ID INT,
STAT VARCHAR(6),
VALUE INT
)
CREATE TABLE dbo.TEST_URNS
(
ID INT
)
-- OLD QUERY
SELECT D.[ID], D.TEST1, D.TEST2, D.TEST3, D.TEST4, D.TEST5, D.TEST6, D.TEST7, D.TEST8, D.TEST9, D.TEST10, D.TEST11, D.TEST12
FROM [dbo].[TEST_URNS] U
INNER JOIN [dbo].[TEST_FLAT] D
ON D.ID = U.ID
-- NEW QUERY
SELECT U.[ID],
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST1') AS TEST1,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST2') AS TEST2,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST3') AS TEST3,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST4') AS TEST4,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST5') AS TEST5,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST6') AS TEST6,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST7') AS TEST7,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST8') AS TEST8,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST9') AS TEST9,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST10') AS TEST10,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST11') AS TEST11,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST12') AS TEST12
FROM [dbo].[TEST_URNS] U
Note this is in SQL2008 R2 and this will be part of a stored procedure, the flat version of the table contained hundreds of thousands of records (900k or so at last count).
Thanks in advance.

Create an index on the STAT column of TEST_DYNAMIC, for quick lookups.
But first consider redesigning TEST_DYNAMIC changing STAT varchar(6) to STAT_ID int (referencing a lookup table)
Then on TEST_DYNAMIC, create an index on STAT_ID which will run quite a bit faster than an index on a text field.

Create your TEST_DYNAMIC and TEST_URNS tables like this:
CREATE TABLE [dbo].[TEST_DYNAMIC](
[ID] [int] IDENTITY(1,1) NOT NULL,
[STAT] [varchar](50) NOT NULL,
[VALUE] [int] IDENTITY(1,1) NOT NULL,
CONSTRAINT [PK_TEST_DYNAMIC] PRIMARY KEY CLUSTERED
(
[ID]
))
CREATE TABLE dbo.TEST_URNS
(
ID [int] IDENTITY(1,1) NOT NULL
)
CONSTRAINT [PK_TEST_URNS] PRIMARY KEY CLUSTERED
(
[ID]
))
If you notice after a period of time that performance becomes poor, then you can check the index fragmentation:
SELECT a.index_id, name, avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(dbo.TEST_DYNAMIC'),
NULL, NULL, NULL) AS a
JOIN sys.indexes AS b ON a.object_id = b.object_id AND a.index_id = b.index_id;
GO
Then you can rebuild the index like so:
ALTER INDEX PK_PK_TEST_DYNAMIC ON dbo.TEST_DYNAMIC
REBUILD;
GO
For details please see https://msdn.microsoft.com/en-us/library/ms189858.aspx
Also, I like #Brett Lalonde's suggestion to change STAT to an int.

The only way to really know is to try it out. In general, modern hardware should be able to support either query with little noticable impact in performance as long as you are indexing both tables correctly (you'll probably need an index on ID and STAT).
If you have 900K entities and 12 attributes, you have around 10 million rows; that should be fine on a decent serer. Eventually, you may run into performance problems if you add many records every month.
The bigger problem is that the example queries you paste are almost certainly not what you'll end up running in your real queries. If you have to filter and/or compare TEST5 with TEST6 on your derived table, you don't benefit from the additional indexing you could do if they were "real" columns.
You could then come full circle and implement your EAV table as an indexed view.

Related

Needed helpful hand with a bit complicated query

I have a table 'Tasks' with the following structure
[TaskId],[CompanyId], [Year], [Month], [Value]
220,1,2018,1,50553.32
220,2,2018,2,222038.12
and another table where users have permissions to particular companies in table named 'UsersCopmpanies'
[UserId], [CompanyId]
1,1
and the thing is task no. 220 was moved between companies. In January task belonged to copmanyId=1 and than in February this task belonged to copmanyId = 2.
According to the table 'UsersCopmpanies' user does not have permision to compnayid = 2.
What I need to do is to get both rows from table 'Tasks' expect field Value, because user does not have persmission.
Expected result should be:
[TaskId], [CompanyId], [Year], [Month],[Value]
220,1,2018,1,50553.32
220,2,2018,2,(NULL or somenthing else for.example string 'lack of permission')
You can use a left join:
select t.TaskId, t.CompanyId, t.Year, t.Month,
(case when uc.CompanyId is not null then Value end) as Value
from tasks t left join
UsersCompanies uc
on uc.CompanyId = t.CompanyId and uc.UserId = 1;
I think this query using LEFT JOIN can be work at you expected :
CREATE TABLE #MyTasks
(TaskId int,
CompanyId int,
YearCol varchar(50),
MonthCol varchar(50),
SomeValue varchar(50)
);
GO
INSERT INTO #MyTasks
SELECT 220,1,2018,1,50553.32
UNION
SELECT 220,2,2018,2,222038.12
CREATE TABLE #MyUsersCopmpanies
(UserId int PRIMARY KEY,
CompanyId varchar(50)
);
GO
INSERT INTO #MyUsersCopmpanies
SELECT 1,1
DECLARE #MyUserParam INT = 1;
SELECT #MyTasks.TaskId, #MyTasks.CompanyId, #MyTasks.YearCol, #MyTasks.MonthCol,
CASE WHEN #MyUsersCopmpanies.UserId IS NOT NULL THEN #MyTasks.SomeValue ELSE 'lack of permission' END AS 'ValueTaskByPermissions'
FROM #MyTasks
LEFT JOIN #MyUsersCopmpanies ON #MyUsersCopmpanies.CompanyId = #MyTasks.CompanyId AND #MyUsersCopmpanies.UserId = #MyUserParam;
DROP TABLE #MyTasks
DROP TABLE #MyUsersCopmpanies
RESULT :
TaskId CompanyId YearCol MonthCol ValueTaskByPermissions
220 1 2018 1 50553.32
220 2 2018 2 lack of permission
Some code :
SELECT t.taskid,t.companyid,t.year,t.month,
(CASE WHEN u.companyid IS NOT NULL THEN t.value ELSE "lack of permission" end) AS ValueData
FROM `x_task` t LEFT JOIN x_userscopmpanies u ON u.companyid = t.companyid

Increase MS SQL transaction performance

If you're missing information, I will attach them if requested.
Workspace
I have a database running on a MS SQL 2012 standard edition of this kind:
tables:
users(id, softId (not unique), birthdate)
rows: 10.5 million
indexes: all three columns, birthdate(clustered)
docs(docId, userId, creationDate, deleteDate, lastname, forename, classificationId)
rows: 23 million
indexes: lastname, forename, docId, creationDate, userID(clustered)
notice: in this specific case the names are related to the docs, not to the userId
classifications(id, description)
rows: 200
three tables "data"
rows: 10, 13 and 0.3 million
indexes: docIds
relations:
users to docs: 1 to n
classifications to docs: 1 to n
docs to data-tables: 1 to n
To select the complete records I am actually on following statements:
Server-Execution-Time 16 seconds
SELECT * FROM (
select * from docs
where userID in (
select distinct userID from users where softId like '...'
)
) as doc
LEFT JOIN users on users.userID = doc.userId
LEFT JOIN classifications on classifications.id = doc.classificationId
LEFT JOIN data1 on data1.docId = doc.docId
LEFT JOIN data2 on data2.docId = doc.docId
LEFT JOIN data3 on data3.docId = doc.docId;
Updated - now 15 seconds
SELECT
docID, calssificationId, classificationDescription,
userId, softId, forename, lastname, birthdate,
data1.id, data1.date, data2.id, data2.date, data3.id, data3.date,
FROM docs
JOIN users on users.userID = doc.userId AND softId like '...'
LEFT JOIN classifications on classifications.id = doc.classificationId
LEFT JOIN data1 on data1.docId = doc.docId
LEFT JOIN data2 on data2.docId = doc.docId
LEFT JOIN data3 on data3.docId = doc.docId;
execution plans
execution plan
Server-Execution-Time 17 seconds
DECLARE #userIDs table( id bigint );
DECLARE #docIDs table( id bigint );
insert into #userIDs select userID from users where softId like '...';
insert into #docIDs select docId from docs where userId in ( select id from #userIDs);
SELECT * FROM users where userID in ( select id from #userIDs);
SELECT * FROM docs where docID in (select id from #docIDs);
SELECT * FROM data1 where data1.docId in (select id from #docIDs);
SELECT * FROM data2 where data2.docId in (select id from #docIDs);
SELECT * FROM data3 where data3.docId in (select id from #docIDs);
GO
Updated - now 14 seconds
DECLARE #userIDs table( id bigint, softId varchar(12), birthdate varchar(8) );
DECLARE #docIDs table( id bigint, classification bigint, capture_date datetime, userId bigint, lastname varchar(50), forename varchar(50) );
INSERT INTO #userIDs select userID, softId, birthdate from users where softId like '...';
INSERT INTO #docIDs select docID, classification, capture_date, userID, lastname, forename from docs where userID in ( select id from #userIDs);
SELECT * FROM #userIDs;
SELECT * FROM #docIDs;
SELECT [only needed fields] FROM data1 where docID in (select id from #docIDs);
SELECT [only needed fields] FROM data2 where docID in (select id from #docIDs);
SELECT [only needed fields] FROM data3 where docID in (select id from #docIDs);
execution plans
execution plan userIds
execution plan docIds
execution plan userIds output
execution plan data1
General Updates
#AntonínLejsek suggested to define the docId of documents as a clustered index and the pkId as non-clustered. This changed the execution-time as follow:
Join-Statement: -1 second
Multi-Select-Statement: -5 seconds
I checked the indexes again and changed the included columns, now they have the execution-time:
Join-Statement: 4 seconds
Multi-Select-Statement: 6 seconds
The "simple" question
Do somebody have suggestions to reduce the executiontime?
I would phrase the logic as:
I would get rid of the first subquery and just do the necessary work on the users table:
SELECT *
FROM docs JOIN
users
ON users.userID = doc.userId AND softId LIKE '...' LEFT JOIN
. . .
The logic in the IN is unnecessary if you are doing a JOIN anyway.
Note: This might not help much, because your query appears to be returning lots of data, both in columns and rows.
I see two different databases in the plan, I would try to test it in one database first.
The database design is weird. You have clustered index on birthdate. As it is not unique, database has to make up another 4B number for making it unique. So You have 12B key in every nonclustered index, which is space and performance inefficient. You do not even have id included in the nonclustered index, so it has to be looked up, which is time wasting. In most cases You should cluster on primary key and that should be id.
--Deleted-- while softIds is almost unique, this paragraph became irelevant.
Add clustered indexes on your table variables by defining primary keys.
DECLARE #userIDs table( id bigint primary key, softId varchar(12), birthdate varchar(8) );
DECLARE #docIDs table( id bigint primary key, classification bigint, capture_date datetime, userId bigint, lastname varchar(50), forename varchar(50) );

SQL : Create a full record from 2 tables

I've got a DB structure as is (simplified to maximum for understanding concern):
Table "entry" ("id" integer primary key)
Table "fields" ("name" varchar primary key, and others)
Table "entry_fields" ("entryid" integer primary key, "name" varchar primary key, "value")
I would like to get, for a given "entry.id", the detail of this entry, ie. all the "entry_fields" linked to this entry, in a single SQL query.
An example would be better perhaps:
"fields":
"result"
"output"
"code"
"command"
"entry" contains:
id : 842
id : 850
"entry_fields" contains:
entryid : 842, name : "result", value : "ok"
entryid : 842, name : "output", value : "this is an output"
entryid : 842, name : "code", value : "42"
entryid : 850, name : "result", value : "ko"
entryid : 850, name : "command", value : "print ko"
The wanted output would be:
| id | command | output | code | result |
| 842 | NULL | "this is an output" | 42 | ok |
| 850 | "print ko" | NULL | NULL | ko |
The aim is to be able to add a "field" without changing anything to "entry" table structure
I tried something like:
SELECT e.*, (SELECT name FROM fields) FROM entry AS e
but Postgres complains:
ERROR: more than one row returned by a subquery used as an expression
Hope someone can help me!
Solution as requested
While stuck with this unfortunate design, the fastest query would be with crosstab(), provided by the additional module tablefunc. Ample details in this related answer:
PostgreSQL Crosstab Query
For the question asked:
SELECT * FROM crosstab(
$$SELECT e.id, ef.name, ef.value
FROM entry e
LEFT JOIN entry_fields ef
ON ef.entryid = e.id
AND ef.name = ANY ('{result,output,code,command}'::text[])
ORDER BY 1, 2$$
,$$SELECT unnest('{result,output,code,command}'::text[])$$
) AS ct (id int, result text, output text, code text, command text);
Database design
If you don't have a huge number of different fields, it will be much simpler and more efficient to merge all three tables into one simple table:
CREATE TABLE entry (
entry_id serial PRIMARY KEY
,field1 text
,field2 text
, ... more fields
);
Fields without values can be NULL. NULL storage is very cheap (basically 1 bit per column in the NULL bitmap):
How much disk-space is needed to store a NULL value using postgresql DB?
Do nullable columns occupy additional space in PostgreSQL?
Even if you have hundreds of different columns, and only few are filled per entry, this will still use much less disk space.
You query becomes trivial:
SELECT entry_id, result, output, code, command
FROM enty;
If you have too many columns1, and that's not just a misguided design (often, this can be folded into much fewer columns), consider the data types hstore or json / jsonb (in Postgres 9.4) for EAV storage.
1 Per Postgres "About" page:
Maximum Columns per Table 250 - 1600 depending on column types
Consider this related answer with alternatives:
Use case for hstore against multiple columns
And this question about typical use cases / problems of EAV structures on dba.SE:
Is there a name for this database structure?
Dynamic SQL:
CREATE TABLE fields (name varchar(100) PRIMARY KEY)
INSERT INTO FIELDS VALUES ('RESULT')
INSERT INTO FIELDS VALUES ('OUTPUT')
INSERT INTO FIELDS VALUES ('CODE')
INSERT INTO FIELDS VALUES ('COMMAND')
CREATE TABLE ENTRY_fields (ENTRYID INT, name varchar(100), VALUE VARCHAR(100) CONSTRAINT PK PRIMARY KEY(ENTRYID, name))
INSERT INTO ENTRY_fields VALUES(842, 'RESULT', 'OK')
INSERT INTO ENTRY_fields VALUES(842, 'OUTPUT', 'THIS IS AN OUTPUT')
INSERT INTO ENTRY_fields VALUES(842, 'CODE', '42')
INSERT INTO ENTRY_fields VALUES(850, 'RESULT', 'KO')
INSERT INTO ENTRY_fields VALUES(850, 'COMMAND', 'PRINT KO')
CREATE TABLE ENTRY (ID INT PRIMARY KEY)
INSERT INTO ENTRY VALUES(842)
INSERT INTO ENTRY VALUES(850)
DECLARE #COLS NVARCHAR(MAX), #SQL NVARCHAR(MAX)
select #Cols = stuff((select ', ' + quotename(dt)
from (select DISTINCT name as dt
from fields) X
FOR XML PATH('')),1,2,'')
PRINT #COLS
SET #SQL = 'SELECT * FROM (SELECT id, f.name, value
from fields F CROSS join ENTRY LEFT JOIN entry_fields ef on ef.name = f.name AND ID = ef.ENTRYID
) Y PIVOT (max(value) for name in ('+ #Cols +'))PVT '
--print #SQL
exec (#SQL)
If you think your values are going to be constant in the fields table:
SELECT * FROM (SELECT id, f.name ,value
from fields F CROSS join ENTRY LEFT JOIN entry_fields ef on ef.name = f.name AND ID = ef.ENTRYID
) Y PIVOT (max(value) for name in ([CODE], [COMMAND], [OUTPUT], [RESULT]))PVT
Query that may work with postgresql:
SELECT ID, MAX(CODE) as CODE, MAX(COMMAND) as COMMAND, MAX(OUTPUT) as OUTPUT, MAX(RESULT) as RESULT
FROM (SELECT ID,
CASE WHEN f.name = 'CODE' THEN VALUE END AS CODE,
CASE WHEN f.name = 'COMMAND' THEN VALUE END AS COMMAND,
CASE WHEN f.name = 'OUTPUT' THEN VALUE END AS OUTPUT,
CASE WHEN f.name = 'RESULT' THEN VALUE END AS RESULT
from fields F CROSS join ENTRY LEFT JOIN entry_fields ef on ef.name = f.name AND ID = ENTRYID
) Y
GROUP BY ID
The subquery (SELECT name FROM fields) would return 4 rows. You can't stuff 4 rows into 1 in SQL. You can use crosstab, which I'm not familiar enough to answer. Or you can use a crude query like this:
SELECT e.*,
(SELECT value FROM entry_fields AS ef WHERE name = 'command' AND ef.entryid = f.entryid) AS command,
(SELECT value FROM entry_fields AS ef WHERE name = 'output' AND ef.entryid = f.entryid) AS output,
(SELECT value FROM entry_fields AS ef WHERE name = 'code' AND ef.entryid = f.entryid) AS code,
(SELECT value FROM entry_fields AS ef WHERE name = 'result' AND ef.entryid = f.entryid) AS result
FROM entry AS e

How to optimize this long running sqlite3 query for finding duplicates?

I've got this rather insane query for finding all but the FIRST record with a duplicate value. It takes a substantially long time to run on 38000 records; about 50 seconds.
UPDATE exr_exrresv
SET mh_duplicate = 1
WHERE exr_exrresv._id IN
(
SELECT F._id
FROM exr_exrresv AS F
WHERE Exists
(
SELECT PHONE_NUMBER,
Count(_id)
FROM exr_exrresv
WHERE exr_exrresv.PHONE_NUMBER = F.PHONE_NUMBER
AND exr_exrresv.PHONE_NUMBER != ''
AND mh_active = 1 AND mh_duplicate = 0
GROUP BY exr_exrresv.PHONE_NUMBER
HAVING Count(exr_exrresv._id) > 1)
)
AND exr_exrresv._id NOT IN
(
SELECT Min(_id)
FROM exr_exrresv AS F
WHERE Exists
(
SELECT PHONE_NUMBER,
Count(_id)
FROM exr_exrresv
WHERE exr_exrresv.PHONE_NUMBER = F.PHONE_NUMBER
AND exr_exrresv.PHONE_NUMBER != ''
AND mh_active = 1
AND mh_duplicate = 0
GROUP BY exr_exrresv.PHONE_NUMBER
HAVING Count(exr_exrresv._id) > 1
)
GROUP BY PHONE_NUMBER
);
Any tips on how to optimize it or how I should begin to go about it? I've checked out the query plan but I'm really not sure how to begin improving it. Temp tables? Better query?
Here is the explain query plan output:
0|0|0|SEARCH TABLE exr_exrresv USING INTEGER PRIMARY KEY (rowid=?) (~12 rows)
0|0|0|EXECUTE LIST SUBQUERY 0
0|0|0|SCAN TABLE exr_exrresv AS F (~500000 rows)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE exr_exrresv USING AUTOMATIC COVERING INDEX (PHONE_NUMBER=? AND mh_active=? AND mh_duplicate=?) (~7 rows)
1|0|0|USE TEMP B-TREE FOR GROUP BY
0|0|0|EXECUTE LIST SUBQUERY 2
2|0|0|SCAN TABLE exr_exrresv AS F (~500000 rows)
2|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 3
3|0|0|SEARCH TABLE exr_exrresv USING AUTOMATIC COVERING INDEX (PHONE_NUMBER=? AND mh_active=? AND mh_duplicate=?) (~7 rows)
3|0|0|USE TEMP B-TREE FOR GROUP BY
2|0|0|USE TEMP B-TREE FOR GROUP BY
Any tips would be much appreciated. :)
Also, I am using Ruby to make the sql query so if it makes more sense for the logic to leave SQL and be written in Ruby, that's possible.
The schema is as follows, and you can use sqlfiddle here: http://sqlfiddle.com/#!2/2c07e
_id INTEGER PRIMARY KEY
OPPORTUNITY_ID varchar(50)
CREATEDDATE varchar(50)
FIRSTNAME varchar(50)
LASTNAME varchar(50)
MAILINGSTREET varchar(50)
MAILINGCITY varchar(50)
MAILINGSTATE varchar(50)
MAILINGZIPPOSTALCODE varchar(50)
EMAIL varchar(50)
CONTACT_PHONE varchar(50)
PHONE_NUMBER varchar(50)
CallFromWeb varchar(50)
OPPORTUNITY_ORIGIN varchar(50)
PROJECTED_LTV varchar(50)
MOVE_IN_DATE varchar(50)
mh_processed_date varchar(50)
mh_control INTEGER
mh_active INTEGER
mh_duplicate INTEGER
Guessing from your post, it looks like you are trying to update the mh_duplicate column for any record that has the same phone number if it's not the first record with that phone number?
If that's correct, I think this should get you the id's to update (you may need to add back your appropriate where criteria) -- from there, the Update is straight-forward:
SELECT e._Id
FROM exr_exrresv e
JOIN
( SELECT t.Phone_Number
FROM exr_exrresv t
GROUP BY t.Phone_Number
HAVING COUNT (t.Phone_Number) > 1
) e2 ON e.Phone_Number = e2.Phone_Number
LEFT JOIN
( SELECT MIN(t2._Id) as KeepId
FROM exr_exrresv t2
GROUP BY t2.Phone_Number
) e3 ON e._Id = e3.KeepId
WHERE e3.KeepId is null
And the SQL Fiddle.
Good luck.
This considers a record duplicate if there exists an active record with a matching phone_number and with a lesser _id. (No grouping or counting needed.)
update exr_exrresv
set mh_duplicate = 1
where exr_exrresv._id in (
select target._id
from exr_exrresv as target
where target.phone_number != ''
and target.mh_active = 1
and exists (
select null from exr_exrresv as probe
where probe.phone_number = target.phone_number
and probe.mh_active = 1
and probe._id < target._id
)
)
This query will be greatly aided if there exists an index on phone_number, ideally on exr_exrresv (phone_number, _id)
SQLFiddle

SQL Server 2005 query optimization with Max subquery

I've got a table that looks like this (I wasn't sure what all might be relevant, so I had Toad dump the whole structure)
CREATE TABLE [dbo].[TScore] (
[CustomerID] int NOT NULL,
[ApplNo] numeric(18, 0) NOT NULL,
[BScore] int NULL,
[OrigAmt] money NULL,
[MaxAmt] money NULL,
[DateCreated] datetime NULL,
[UserCreated] char(8) NULL,
[DateModified] datetime NULL,
[UserModified] char(8) NULL,
CONSTRAINT [PK_TScore]
PRIMARY KEY CLUSTERED ([CustomerID] ASC, [ApplNo] ASC)
);
And when I run the following query (on a database with 3 million records in the TScore table) it takes about a second to run, even though if I just do: Select BScore from CustomerDB..TScore WHERE CustomerID = 12345, it is instant (and only returns 10 records) -- seems like there should be some efficient way to do the Max(ApplNo) effect in a single query, but I'm a relative noob to SQL Server, and not sure -- I'm thinking I may need a separate key for ApplNo, but not sure how clustered keys work.
SELECT BScore
FROM CustomerDB..TScore (NOLOCK)
WHERE ApplNo = (SELECT Max(ApplNo)
FROM CustomerDB..TScore sc2 (NOLOCK)
WHERE sc2.CustomerID = 12345)
Thanks much for any tips (pointers on where to look for optimization of sql server stuff appreciated as well)
When you filter by ApplNo, you are using only part of the key. And not the left hand side. This means the index has be scanned (look at all rows) not seeked (drill to a row) to find the values.
If you are looking for ApplNo values for the same CustomerID:
Quick way. Use the full clustered index:
SELECT BScore
FROM CustomerDB..TScore
WHERE ApplNo = (SELECT Max(ApplNo)
FROM CustomerDB..TScore sc2
WHERE sc2.CustomerID = 12345)
AND CustomerID = 12345
This can be changed into a JOIN
SELECT BScore
FROM
CustomerDB..TScore T1
JOIN
(SELECT Max(ApplNo) AS MaxApplNo, CustomerID
FROM CustomerDB..TScore sc2
WHERE sc2.CustomerID = 12345
) T2 ON T1.CustomerID = T2.CustomerID AND T1.ApplNo= T2.MaxApplNo
If you are looking for ApplNo values independent of CustomerID, then I'd look at a separate index. This matches your intent of the current code
CREATE INDEX IX_ApplNo ON TScore (ApplNo) INCLUDE (BScore);
Reversing the key order won't help because then your WHERE sc2.CustomerID = 12345 will scan, not seek
Note: using NOLOCK everywhere is a bad practice