I want to add some string with the primary key value while creating the table in sql?
Example:
my primary key column should automatically generate values like below:
'EMP101'
'EMP102'
'EMP103'
How to achieve it?
Try this: (For SQL Server 2012)
UPDATE MyTable
SET EMPID = CONCAT('EMP' , EMPID)
Or this: (For SQL Server < 2012)
UPDATE MyTable
SET EMPID = 'EMP' + EMPID
SQLFiddle for SQL Server 2008
SQLFiddle for SQL Server 2012
Since you want to set auto increment in VARCHAR type column you can try this table schema:
CREATE TABLE MyTable
(EMP INT NOT NULL IDENTITY(1000, 1)
,[EMPID] AS 'EMP' + CAST(EMP AS VARCHAR(10)) PERSISTED PRIMARY KEY
,EMPName VARCHAR(20))
;
INSERT INTO MyTable(EMPName) VALUES
('AA')
,('BB')
,('CC')
,('DD')
,('EE')
,('FF')
Output:
| EMP | EMPID | EMPNAME |
----------------------------
| 1000 | EMP1000 | AA |
| 1001 | EMP1001 | BB |
| 1002 | EMP1002 | CC |
| 1003 | EMP1003 | DD |
| 1004 | EMP1004 | EE |
| 1005 | EMP1005 | FF |
See this SQLFiddle
Here you can see EMPID is auto incremented column with Primary key.
Source: HOW TO SET IDENTITY KEY/AUTO INCREMENT ON VARCHAR COLUMN IN SQL SERVER (Thanks to #bvr)
What the rule of thumb is, is that never use meaningful information in primary keys (like Employee Number / Social Security number). Let that just be a plain autoincremented integer. However constant the data seems - it may change at one point (new legislation comes and all SSNs are recalculated).
it seems the only reason you are want to use a non-integer keys is that the key is generated as string concatenation with another column to make it unique.
From a best practice perspective, it is strongly recommended that integer primary keys are used, but often, this guidance is ignored.
May be going through the following posts might be of help:
Should I design a table with a primary key of varchar or int?
SQL primary key: integer vs varchar
You can achieve it at least in two ways:
Generate new id on the fly when you insert a new record
Create INSTEAD OF INSERT trigger that will do that for you
If you have a table schema like this
CREATE TABLE Table1
([emp_id] varchar(12) primary key, [name] varchar(64))
For the first scenario you can use a query
INSERT INTO Table1 (emp_id, name)
SELECT newid, 'Jhon'
FROM
(
SELECT 'EMP' + CONVERT(VARCHAR(9), COALESCE(REPLACE(MAX(emp_id), 'EMP', ''), 0) + 1) newid
FROM Table1 WITH (TABLOCKX, HOLDLOCK)
) q
Here is SQLFiddle demo
For the second scenario you can a trigger like this
CREATE TRIGGER tg_table1_insert ON Table1
INSTEAD OF INSERT AS
BEGIN
DECLARE #max INT
SET #max =
(SELECT COALESCE(REPLACE(MAX(emp_id), 'EMP', ''), 0)
FROM Table1 WITH (TABLOCKX, HOLDLOCK)
)
INSERT INTO Table1 (emp_id, name)
SELECT 'EMP' + CONVERT(VARCHAR(9), #max + ROW_NUMBER() OVER (ORDER BY (SELECT 1))), name
FROM INSERTED
END
Here is SQLFiddle demo
I am looking to do something similar but don't see an answer to my problem here.
I want a primary Key like "JonesB_01" as this is how we want our job number represented in our production system.
--ID | First_Name | Second_Name | Phone | Etc..
-- Bob Jones 9999-999-999
--ID = "Second_Name"+"F"irst Initial+"_(01-99)"
The number 01-99 has been included to allow for multiple instances of a customer with the same surname and first initial. In our industry it's not unusual for the same customer to have work done on multiple occasions but are not repeat business on an ongoing basis. I expect this convention to last a very long time. If we ever exceed it, then I can simply add a third interger.
I want this to auto populate to keep data entry as simple as possible.
I managed to get a solution to work using Excel formulars and a few helper cells but am new to SQL.
--CellA2 = JonesB_01 (=concatenate(D2+E2))
--CellB2 = "Bob"
--CellC2 = "Jones"
--CellD2 = "JonesB" (=if(B2="","",Concatenate(C2,Left(B2)))
--CellE2 = "_01" (=concatenate("_",Text(F2,"00"))
--CellF2 = "1" (=If(D2="","",Countif($D$2:$D2,D2))
Thanks.
SELECT 'EMP' || TO_CHAR(NVL(MAX(TO_NUMBER(SUBSTR(A.EMP_NO, 4,3))), '000')+1) AS NEW_EMP_NO
FROM
(SELECT 'EMP101' EMP_NO
FROM DUAL
UNION ALL
SELECT 'EMP102' EMP_NO
FROM DUAL
UNION ALL
SELECT 'EMP103' EMP_NO
FROM DUAL
) A
EDIT: there was a mistake in the following question that explains the observations. I could delete the question but this might still be useful to someone. The mistake was that the actual query running on the server was SELECT * FROM t (which was silly) when I thought it was running SELECT t.* FROM t (which makes all the difference). See tobyobrian's answer and the comments to it.
I've a too slow query in a situation with a schema as follows. Table t has data rows indexed by t_id. t adjoins tables x and y via junction tables t_x and t_y each of which contains only the foreigns keys required for the JOINs:
CREATE TABLE t (
t_id INT NOT NULL PRIMARY KEY,
data columns...
);
CREATE TABLE t_x (
t_id INT NOT NULL,
x_id INT NOT NULL,
PRIMARY KEY (t_id, x_id),
KEY (x_id)
);
CREATE TABLE t_y (
t_id INT NOT NULL,
y_id INT NOT NULL,
PRIMARY KEY (t_id, y_id),
KEY (y_id)
);
I need to export the stray rows in t, i.e. those not referenced in either junction table.
SELECT t.* FROM t
LEFT JOIN t_x ON t_x.t_id=t.t_id
LEFT JOIN t_y ON t_y.t_id=t.t_id
WHERE t_x.t_id IS NULL OR t_y.t_id IS NULL
INTO OUTFILE ...;
t has 21 M rows while t_x and t_y both have about 25 M rows. So this is naturally going to be a slow query.
I'm using MyISAM so I thought I'd try to speed it up by preloading the t_x and t_y indexes. The combined size of t_x.MYI and t_y.MYI was about 1.2 M bytes so I created a dedicated key buffer for them, assigned their PRIMARY keys to the dedicated buffer and LOAD INDEX INTO CACHE'ed them.
But as I watch the query in operation, mysqld is using about 1% CPU, the average system IO pending queue length is around 5, and mysqld's average seek size is in the 250 k range. Moreover, nearly all the IO is mysqld reading from t_x.MYI and t_x.MYD.
I don't understand:
Why mysqld is reading the .MYD files at all?
Why mysqld isn't using the preloaded the t_x and t_y indexes?
Could it have something to do with the t_x and t_y PRIMARY keys being over two columns?
EDIT: The query explained:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------+---------+-----------+----------+-------------+
| 1 | SIMPLE | t | ALL | NULL | NULL | NULL | NULL | 20980052 | |
| 1 | SIMPLE | t_x | ref | PRIMARY | PRIMARY | 4 | db.t.t_id | 235849 | Using index |
| 1 | SIMPLE | t_y | ref | PRIMARY | PRIMARY | 4 | db.t.t_id | 207947 | Using where |
+----+-------------+-------+------+---------------+---------+---------+-----------+----------+-------------+
Use not exists - this will be the fastest - much better than 'joins' or using 'not in' in this sitution.
SELECT t.* FROM t a
Where not exists (select 1 from t_x b
where b.t_id = a.t_id)
or not exists (select 1 from t_y c
where c.t_id = a.t_id);
I can answer part 1 of your question, and i may or may not be able to answer part two if you post the output of EXPLAIN:
In order to select t.* it needs to look in the MYD file - only the primary key is in the index, to fetch the data columns you requested it needs the rest of the columns.
That is, your query is quite probably filtering the results very quickly, its just struggling to copy all the data you wanted.
Also note that you will probably have duplicates in your output - if one row has no refs in t_x, but 3 in x_y you will have the same t.* repeated 3 times. Given we think the where clause is sufficiently efficient, and much time is spent on reading the actual data, this is quite possibly the source of your problems. try changing to select distinct and see if that helps your efficiency
This may be a bit more efficient:
SELECT *
FROM t
WHERE t.id NOT IN (
SELECT DISTINCT t_id
FROM t_x
UNION
SELECT DISTINCT t_id
FROM t_y
);
(I wish I could have come up with a more descriptive title... suggest one or edit this post if you can name the type of query I'm asking about)
Database: SQL Server 2000
Sample Data (assume 500,000 rows):
Name Candy PreferenceFactor
Jim Chocolate 1.0
Brad Lemon Drop .9
Brad Chocolate .1
Chris Chocolate .5
Chris Candy Cane .5
499,995 more rows...
Note that the number of rows with a given 'Name' is unbounded.
Desired Query Results:
Jim Chocolate 1.0
Brad Lemon Drop .9
Chris Chocolate .5
~250,000 more rows...
(Since Chris has equal preference for Candy Cane and Chocolate, a consistent result is adequate).
Question:
How do I Select Name, Candy from data where each resulting row contains a unique Name such that the Candy selected has the highest PreferenceFactor for each Name. (speedy efficient answers preferred).
What indexes are required on the table? Does it make a difference if Name and Candy are integer indexes into another table (aside from requiring some joins)?
You will find that the following query outperforms every other answer given, as it works with a single scan. This simulates MS Access's First and Last aggregate functions, which is basically what you are doing.
Of course, you'll probably have foreign keys instead of names in your CandyPreference table. To answer your question, it is in fact very much best if Candy and Name are foreign keys into another table.
If there are other columns in the CandyPreferences table, then having a covering index that includes the involved columns will yield even better performance. Making the columns as small as possible will increase the rows per page and again increase performance. If you are most often doing the query with a WHERE condition to restrict rows, then an index that covers the WHERE conditions becomes important.
Peter was on the right track for this, but had some unneeded complexity.
CREATE TABLE #CandyPreference (
[Name] varchar(20),
Candy varchar(30),
PreferenceFactor decimal(11, 10)
)
INSERT #CandyPreference VALUES ('Jim', 'Chocolate', 1.0)
INSERT #CandyPreference VALUES ('Brad', 'Lemon Drop', .9)
INSERT #CandyPreference VALUES ('Brad', 'Chocolate', .1)
INSERT #CandyPreference VALUES ('Chris', 'Chocolate', .5)
INSERT #CandyPreference VALUES ('Chris', 'Candy Cane', .5)
SELECT
[Name],
Candy = Substring(PackedData, 13, 30),
PreferenceFactor = Convert(decimal(11,10), Left(PackedData, 12))
FROM (
SELECT
[Name],
PackedData = Max(Convert(char(12), PreferenceFactor) + Candy)
FROM CandyPreference
GROUP BY [Name]
) X
DROP TABLE #CandyPreference
I actually don't recommend this method unless performance is critical. The "canonical" way to do it is OrbMan's standard Max/GROUP BY derived table and then a join to it to get the selected row. Though, that method starts to become difficult when there are several columns that participate in the selection of the Max, and the final combination of selectors can be duplicated, that is, when there is no column to provide arbitrary uniqueness as in the case here where we use the name if the PreferenceFactor is the same.
Edit: It's probably best to give some more usage notes to help improve clarity and to help people avoid problems.
As a general rule of thumb, when trying to improve query performance, you can do a LOT of extra math if it will save you I/O. Saving an entire table seek or scan speeds up the query substantially, even with all the converts and substrings and so on.
Due to precision and sorting issues, use of a floating point data type is probably a bad idea with this method. Though unless you are dealing with extremely large or small numbers, you shouldn't be using float in your database anyway.
The best data types are those that are not packed and sort in the same order after conversion to binary or char. Datetime, smalldatetime, bigint, int, smallint, and tinyint all convert directly to binary and sort correctly because they are not packed. With binary, avoid left() and right(), use substring() to get the values reliably returned to their originals.
I took advantage of Preference having only one digit in front of the decimal point in this query, allowing conversion straight to char since there is always at least a 0 before the decimal point. If more digits are possible, you would have to decimal-align the converted number so things sort correctly. Easiest might be to multiply your Preference rating so there is no decimal portion, convert to bigint, and then convert to binary(8). In general, conversion between numbers is faster than conversion between char and another data type, especially with date math.
Watch out for nulls. If there are any, you must convert them to something and then back.
select c.Name, max(c.Candy) as Candy, max(c.PreferenceFactor) as PreferenceFactor
from Candy c
inner join (
select Name, max(PreferenceFactor) as MaxPreferenceFactor
from Candy
group by Name
) cm on c.Name = cm.Name and c.PreferenceFactor = cm.MaxPreferenceFactor
group by c.Name
order by PreferenceFactor desc, Name
I tried:
SELECT X.PersonName,
(
SELECT TOP 1 Candy
FROM CandyPreferences
WHERE PersonName=X.PersonName AND PreferenceFactor=x.HighestPreference
) AS TopCandy
FROM
(
SELECT PersonName, MAX(PreferenceFactor) AS HighestPreference
FROM CandyPreferences
GROUP BY PersonName
) AS X
This seems to work, though I can't speak to efficiency without real data and a realistic load.
I did create a primary key over PersonName and Candy, though. Using SQL Server 2008 and no additional indexes shows it using two clustered index scans though, so it could be worse.
I played with this a bit more because I needed an excuse to play with the Data Generation Plan capability of "datadude". First, I refactored the one table to have separate tables for candy names and person names. I did this mostly because it allowed me to use the test data generation without having to read the documentation. The schema became:
CREATE TABLE [Candies](
[CandyID] [int] IDENTITY(1,1) NOT NULL,
[Candy] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_Candies] PRIMARY KEY CLUSTERED
(
[CandyID] ASC
),
CONSTRAINT [UC_Candies] UNIQUE NONCLUSTERED
(
[Candy] ASC
)
)
GO
CREATE TABLE [Persons](
[PersonID] [int] IDENTITY(1,1) NOT NULL,
[PersonName] [nvarchar](100) NOT NULL,
CONSTRAINT [PK_Preferences.Persons] PRIMARY KEY CLUSTERED
(
[PersonID] ASC
)
)
GO
CREATE TABLE [CandyPreferences](
[PersonID] [int] NOT NULL,
[CandyID] [int] NOT NULL,
[PrefernceFactor] [real] NOT NULL,
CONSTRAINT [PK_CandyPreferences] PRIMARY KEY CLUSTERED
(
[PersonID] ASC,
[CandyID] ASC
)
)
GO
ALTER TABLE [CandyPreferences]
WITH CHECK ADD CONSTRAINT [FK_CandyPreferences_Candies] FOREIGN KEY([CandyID])
REFERENCES [Candies] ([CandyID])
GO
ALTER TABLE [CandyPreferences]
CHECK CONSTRAINT [FK_CandyPreferences_Candies]
GO
ALTER TABLE [CandyPreferences]
WITH CHECK ADD CONSTRAINT [FK_CandyPreferences_Persons] FOREIGN KEY([PersonID])
REFERENCES [Persons] ([PersonID])
GO
ALTER TABLE [CandyPreferences]
CHECK CONSTRAINT [FK_CandyPreferences_Persons]
GO
The query became:
SELECT P.PersonName, C.Candy
FROM (
SELECT X.PersonID,
(
SELECT TOP 1 CandyID
FROM CandyPreferences
WHERE PersonID=X.PersonID AND PrefernceFactor=x.HighestPreference
) AS TopCandy
FROM
(
SELECT PersonID, MAX(PrefernceFactor) AS HighestPreference
FROM CandyPreferences
GROUP BY PersonID
) AS X
) AS Y
INNER JOIN Persons P ON Y.PersonID = P.PersonID
INNER JOIN Candies C ON Y.TopCandy = C.CandyID
With 150,000 candies, 200,000 persons, and 500,000 CandyPreferences, the query took about 12 seconds and produced 200,000 rows.
The following result surprised me. I changed the query to remove the final "pretty" joins:
SELECT X.PersonID,
(
SELECT TOP 1 CandyID
FROM CandyPreferences
WHERE PersonID=X.PersonID AND PrefernceFactor=x.HighestPreference
) AS TopCandy
FROM
(
SELECT PersonID, MAX(PrefernceFactor) AS HighestPreference
FROM CandyPreferences
GROUP BY PersonID
) AS X
This now takes two or three seconds for 200,000 rows.
Now, to be clear, nothing I've done here has been meant to improve the performance of this query: I considered 12 seconds to be a success. It now says it spends 90% of its time in a clustered index seek.
Comment on Emtucifor solution (as I cant make regular comments)
I like this solution, but have some comments how it could be improved (in this specific case).
It can't be done much if you have everything in one table, but having few tables as in John Saunders' solution will make things a bit different.
As we are dealing with numbers in [CandyPreferences] table we can use math operation instead of concatenation to get max value.
I suggest PreferenceFactor to be decimal instead of real, as I believe we don't need here size of real data type, and even further I would suggest decimal(n,n) where n<10 to have only decimal part stored in 5 bytes. Assume decimal(3,3) is enough (1000 levels of preference factor), we can do simple
PackedData = Max(PreferenceFactor + CandyID)
Further, if we know we have less than 1,000,000 CandyIDs we can add cast as:
PackedData = Max(Cast(PreferenceFactor + CandyID as decimal(9,3)))
allowing sql server to use 5 bytes in temporary table
Unpacking is easy and fast using floor function.
Niikola
-- ADDED LATER ---
I tested both solutions, John's and Emtucifor's (modified to use John's structure and using my suggestions). I tested also with and without joins.
Emtucifor's solution clearly wins, but margins are not huge. It could be different if SQL server had to perform some Physical reads, but they were 0 in all cases.
Here are the queries:
SELECT
[PersonID],
CandyID = Floor(PackedData),
PreferenceFactor = Cast(PackedData-Floor(PackedData) as decimal(3,3))
FROM (
SELECT
[PersonID],
PackedData = Max(Cast([PrefernceFactor] + [CandyID] as decimal(9,3)))
FROM [z5CandyPreferences] With (NoLock)
GROUP BY [PersonID]
) X
SELECT X.PersonID,
(
SELECT TOP 1 CandyID
FROM z5CandyPreferences
WHERE PersonID=X.PersonID AND PrefernceFactor=x.HighestPreference
) AS TopCandy,
HighestPreference as PreferenceFactor
FROM
(
SELECT PersonID, MAX(PrefernceFactor) AS HighestPreference
FROM z5CandyPreferences
GROUP BY PersonID
) AS X
Select p.PersonName,
c.Candy,
y.PreferenceFactor
From z5Persons p
Inner Join (SELECT [PersonID],
CandyID = Floor(PackedData),
PreferenceFactor = Cast(PackedData-Floor(PackedData) as decimal(3,3))
FROM ( SELECT [PersonID],
PackedData = Max(Cast([PrefernceFactor] + [CandyID] as decimal(9,3)))
FROM [z5CandyPreferences] With (NoLock)
GROUP BY [PersonID]
) X
) Y on p.PersonId = Y.PersonId
Inner Join z5Candies c on c.CandyId=Y.CandyId
Select p.PersonName,
c.Candy,
y.PreferenceFactor
From z5Persons p
Inner Join (SELECT X.PersonID,
( SELECT TOP 1 cp.CandyId
FROM z5CandyPreferences cp
WHERE PersonID=X.PersonID AND cp.[PrefernceFactor]=X.HighestPreference
) CandyId,
HighestPreference as PreferenceFactor
FROM ( SELECT PersonID,
MAX(PrefernceFactor) AS HighestPreference
FROM z5CandyPreferences
GROUP BY PersonID
) AS X
) AS Y on p.PersonId = Y.PersonId
Inner Join z5Candies as c on c.CandyID=Y.CandyId
And the results:
TableName nRows
------------------ -------
z5Persons 200,000
z5Candies 150,000
z5CandyPreferences 497,445
Query Rows Affected CPU time Elapsed time
--------------------------- ------------- -------- ------------
Emtucifor (no joins) 183,289 531 ms 3,122 ms
John Saunders (no joins) 183,289 1,266 ms 2,918 ms
Emtucifor (with joins) 183,289 1,031 ms 3,990 ms
John Saunders (with joins) 183,289 2,406 ms 4,343 ms
Emtucifor (no joins)
--------------------------------------------
Table Scan count logical reads
------------------- ---------- -------------
z5CandyPreferences 1 2,022
John Saunders (no joins)
--------------------------------------------
Table Scan count logical reads
------------------- ---------- -------------
z5CandyPreferences 183,290 587,677
Emtucifor (with joins)
--------------------------------------------
Table Scan count logical reads
------------------- ---------- -------------
Worktable 0 0
z5Candies 1 526
z5CandyPreferences 1 2,022
z5Persons 1 733
John Saunders (with joins)
--------------------------------------------
Table Scan count logical reads
------------------- ---------- -------------
z5CandyPreferences 183292 587,912
z5Persons 3 802
Worktable 0 0
z5Candies 3 559
Worktable 0 0
you could use following select statements
select Name,Candy,PreferenceFactor
from candyTable ct
where PreferenceFactor =
(select max(PreferenceFactor)
from candyTable where ct.Name = Name)
but with this select you will get "Chris" 2 times in your result set.
if you want to get the the most preferred food by user than use
select top 1 Name,Candy,PreferenceFactor
from candyTable ct
where name = #name
and PreferenceFactor=
(select max([PreferenceFactor])
from candyTable where name = #name )
i think changing the name and candy to integer types might help you improve performance. you also should insert indexes on both columns.
[Edit] changed ! to #
SELECT Name, Candy, PreferenceFactor
FROM table AS a
WHERE NOT EXISTS(SELECT * FROM table AS b
WHERE b.Name = a.Name
AND (b.PreferenceFactor > a.PreferenceFactor OR (b.PreferenceFactor = a.PreferenceFactor AND b.Candy > a.Candy))
select name, candy, max(preference)
from tablename
where candy=#candy
order by name, candy
usually indexing is required on columns which are frequently included in where clause. In this case I would say indexing on name and candy columns would be of highest priority.
Having lookup tables for columns usually depends on number of repeating values with in columns. Out of 250,000 rows, if there are only 50 values that are repeating, you really need to have integer reference (foreign key) there. In this case, candy reference should be done and name reference really depends on the number of distinct people within the database.
I changed your column Name to PersonName to avoid any common reserved word conflicts.
SELECT PersonName, MAX(Candy) AS PreferredCandy, MAX(PreferenceFactor) AS Factor
FROM CandyPreference
GROUP BY PersonName
ORDER BY Factor DESC
SELECT d.Name, a.Candy, d.MaxPref
FROM myTable a, (SELECT Name, MAX(PreferenceFactor) AS MaxPref FROM myTable) as D
WHERE a.Name = d.Name AND a.PreferenceFactor = d.MaxPref
This should give you rows with matching PrefFactor for a given Name.
(e.g. if John as a HighPref of 1 for Lemon & Chocolate).
Pardon my answer as I am writing it without SQL Query Analyzer.
Something like this would work:
select name
, candy = substring(preference,7,len(preference))
-- convert back to float/numeric
, factor = convert(float,substring(preference,1,5))/10
from (
select name,
preference = (
select top 1
-- convert from float/numeric to zero-padded fixed-width string
right('00000'+convert(varchar,convert(decimal(5,0),preferencefactor*10)),5)
+ ';' + candy
from candyTable b
where a.name = b.name
order by
preferencefactor desc
, candy
)
from (select distinct name from candyTable) a
) a
Performance should be decent with with method. Check your query plan.
TOP 1 ... ORDER BY in a correlated subquery allows us to specify arbitrary rules for which row we want returned per row in the outer query. In this case, we want the highest preference factor per name, with candy for tie-breaks.
Subqueries can only return one value, so we must combine candy and preference factor into one field. The semicolon is just for readability here, but in other cases, you might use it to parse the combined field with CHARINDEX in the outer query.
If you wanted full precision in the output, you could use this instead (assuming preferencefactor is a float):
convert(varchar,preferencefactor) + ';' + candy
And then parse it back with:
factor = convert(float,substring(preference,1,charindex(';',preference)-1))
candy = substring(preference,charindex(';',preference)+1,len(preference))
I tested also ROW_NUMBER() version + added additional index
Create index IX_z5CandyPreferences On z5CandyPreferences(PersonId,PrefernceFactor,CandyID)
Response times between Emtucifor's and ROW_NUMBER() version (with index in place) are marginal (if any - test should be repeated number of times and take averages, but I expect there would not be any significant difference)
Here is query:
Select p.PersonName,
c.Candy,
y.PrefernceFactor
From z5Persons p
Inner Join (Select * from (Select cp.PersonId,
cp.CandyId,
cp.PrefernceFactor,
ROW_NUMBER() over (Partition by cp.PersonId Order by cp.PrefernceFactor, cp.CandyId ) as hp
From z5CandyPreferences cp) X
Where hp=1) Y on p.PersonId = Y.PersonId
Inner Join z5Candies c on c.CandyId=Y.CandyId
and results with and without new index:
| Without index | With Index
----------------------------------------------
Query (Aff.Rows 183,290) |CPU time Elapsed time | CPU time Elapsed time
-------------------------- |-------- ------------ | -------- ------------
Emtucifor (with joins) |1,031 ms 3,990 ms | 890 ms 3,758 ms
John Saunders (with joins) |2,406 ms 4,343 ms | 1,735 ms 3,414 ms
ROW_NUMBER() (with joins) |2,094 ms 4,888 ms | 953 ms 3,900 ms.
Emtucifor (with joins) Without index | With Index
-----------------------------------------------------------------------
Table |Scan count logical reads | Scan count logical reads
-------------------|---------- ------------- | ---------- -------------
Worktable | 0 0 | 0 0
z5Candies | 1 526 | 1 526
z5CandyPreferences | 1 2,022 | 1 990
z5Persons | 1 733 | 1 733
John Saunders (with joins) Without index | With Index
-----------------------------------------------------------------------
Table |Scan count logical reads | Scan count logical reads
-------------------|---------- ------------- | ---------- -------------
z5CandyPreferences | 183292 587,912 | 183,290 585,570
z5Persons | 3 802 | 1 733
Worktable | 0 0 | 0 0
z5Candies | 3 559 | 1 526
Worktable | 0 0 | - -
ROW_NUMBER() (with joins) Without index | With Index
-----------------------------------------------------------------------
Table |Scan count logical reads | Scan count logical reads
-------------------|---------- ------------- | ---------- -------------
z5CandyPreferences | 3 2233 | 1 990
z5Persons | 3 802 | 1 733
z5Candies | 3 559 | 1 526
Worktable | 0 0 | 0 0