Good morning. I'll do my best to explain my question without posting the SQL (it's 650 lines). Let me know if more information is needed.
We have an in-house fulfillment system that is allocating inventory in real time. For allocation to work properly, we need to know how much inventory is available each time a user asks what they should be working on (by loading/reloading their task list). The data would look something like this:
ID ItemID QtyOrdered QtyAvailableAfterAllocation ParentID
1 1234 5 500 NULL
2 1234 15 485 1
3 1234 10 475 2
Currently a while loop is being used to set the QtyAvailableAfterAllocation column. The example above demonstrates the need for the loop. Row 2's QtyAvailableAfterAllocation is dependent on the value of row 1's QtyAvailableAfterAllocation. Row 3 is dependent on row 2 and so on.
This is the (very) simplified version of the logic. It gets infinitely more complicated when you take into account kits (groups of inventory items that belong to a single parent item). There are times that inventory does not need to be allocated to the item because it exists inside of a kit that has sufficient inventory to fulfill the order. This is why we can't do a running total. Also, kits could be nested inside of kits to the Nth level. Therein lies the problem. When dealing with a large amount of orders that have nested kits, the performance of the query is very poor. I believe that the loop is to blame (testing has proved this). So, here's the question:
Is it possible to commit an update, one row at a time and in a specific order (without a loop), so that the child record(s) below can access the updated column (QtyAvailAfterOrder_AllocationScope) in the parent record?
EDIT
Here is a small portion of the SQL. It's the actual while loop. Maybe this will help show the logic that's needed to determine the allocation for each record.
http://pastebin.com/VM9iasq9
Can you cheat and do something like this?
DECLARE #CurrentCount int
SELECT #CurrentCount = QtyAvailableAfterAllocation
FROM blah
WHERE <select the parent of the first row>
UPDATE blah
SET QtyAvailableAfterAllocation = #CurrentCount - QtyOrdered,
#CurrentCount = #CurrentCount - QtyOrdered
WHERE <it is valid to deduct the count>
This should allow you to keep the update as set based and count downwards from a starting quantity. The crux of the problem here is the WHERE clause.
One method we have been doing is to flatten a hierarchy of values (in your case, the Nth kits idea) into a table, then you can join onto this flat table. The flattening of the hierarchy and the single join should help alleviate some of the performance quirks. Perhaps use a view to flatten the data.
Sorry this isn't a direct answer and only ideas.
If you can provide a sample data structure showing how the kits fit in, I'm sure someone can help thrash out a more specific solution.
If you do have requests queued up in some structure, you wouldn't employ a SQL statement to process the queue; "queue" and "SQL", conceptually, are at odds: SQL is set-based, not procedural.
So, forget about using a query to manage the queued requests, and process the queue in a procedure, wrapping each part requisition in a transaction:
pseudo:
WHILE REQUESTS_REMAIN_IN_QUEUE
begin trans
execute requisition SQL statements
commit
LOOP
Your requisition statements (simplified) might look like this:
update inventory
set QOH = QOH- {requested amount}
where partno = ? and QOH >= {requested amount}
insert orderdetail
(customer, orderheaderid, partno, requestedamount)
values
(custid, orderheaderid, partno, requested_amount)
Now in a complicated system involving kits and custom business logic, you might have a rule that says not to decrement inventory if every component in a kit is not avaialable. Then you'd have to wrap your kit requisition in a transaction and rollback if you encounter a situation where an individual component in the kit is backordered, say.
I think this problem can be solved using purely set-based approach.
Basically, you need to perform these steps:
Obtain the table of currently available quantity for every item.
Obtain the running totals from the ordered quantity due to be processed.
Get QtyAvailableAfterAllocation for every item as the result of subtraction of its running total from its available quantity.
Here's a sample solution:
/* sample data definition & initialisation */
DECLARE #LastQty TABLE (Item int, Qty int);
INSERT INTO #LastQty (Item, Qty)
SELECT 0123, 404 UNION ALL
SELECT 1234, 505 UNION ALL
SELECT 2345, 606 UNION ALL
SELECT 3456, 707 UNION ALL
SELECT 4567, 808 UNION ALL
SELECT 5678, 909;
DECLARE #Orders TABLE (ID int, Item int, OrderedQty int);
INSERT INTO #Orders (ID, Item, OrderedQty)
SELECT 1, 1234, 5 UNION ALL
SELECT 2, 1234, 15 UNION ALL
SELECT 3, 2345, 3 UNION ALL
SELECT 4, 1234, 10 UNION ALL
SELECT 5, 2345, 37 UNION ALL
SELECT 6, 2345, 45 UNION ALL
SELECT 7, 3456, 50 UNION ALL
SELECT 8, 4567, 25 UNION ALL
SELECT 9, 2345, 30;
/* the actuall query begins here */
WITH RankedOrders AS (
SELECT
*,
rn = ROW_NUMBER() OVER (PARTITION BY Item ORDER BY ID)
FROM #Orders
),
RunningOrderTotals AS (
SELECT
ID,
Item,
OrderedQty,
RunningTotalQty = OrderedQty,
rn
FROM RankedOrders
WHERE rn = 1
UNION ALL
SELECT
o.ID,
o.Item,
o.OrderedQty,
RunningTotalQty = r.RunningTotalQty + o.OrderedQty,
o.rn
FROM RankedOrders o
INNER JOIN RunningOrderTotals r ON o.Item = r.Item AND o.rn = r.rn + 1
)
SELECT
t.ID,
t.Item,
t.OrderedQty,
QtyAvailableAfterAllocation = oh.Qty - t.RunningTotalQty
FROM RunningOrderTotals t
INNER JOIN #LastQty oh ON t.Item = oh.Item
ORDER BY t.ID;
Note: For the purpose of my example I initialised the available item quantity table (#LastQty) manually. However, you are most probably going to derive it from your data.
Based on the comments/answers above and my inability to accurately represent this complicated issue properly, I've rewritten the processing in C#. Using PLINQ, I've reduced the processing time from 15 seconds to 4. Thanks to all those who tried to help!
If this isn't the appropriate way to close a question, let me know (and let me know the appropriate way so I can do that instead).
Related
I have an application where I find a Sum() of a database column for a set of records and later use that sum in a separate query, similar to the following (made up tables, but the idea is the same):
SELECT Sum(cost)
INTO v_cost_total
FROM materials
WHERE material_id >=0
AND material_id <= 10;
[a little bit of interim work]
SELECT material_id, cost/v_cost_total
INTO v_material_id_collection, v_pct_collection
FROM materials
WHERE material_id >=0
AND material_id <= 10
FOR UPDATE;
However, in theory someone could update the cost column on the materials table between the two queries, in which case the calculated percents will be off.
Ideally, I would just use a FOR UPDATE clause on the first query, but when I try that, I get an error:
ORA-01786: FOR UPDATE of this query expression is not allowed
Now, the work-around isn't the problem - just do an extra query to lock the rows before finding the Sum(), but that query would serve no other purpose than locking the tables. While this particular example is not time consuming, the extra query could cause a performance hit in certain situations, and it's not as clean, so I'd like to avoid having to do that.
Does anyone know of a particular reason why this is not allowed? In my head, the FOR UPDATE clause should just lock the rows that match the WHERE clause - I don't see why it matters what we are doing with those rows.
EDIT: It looks like SELECT ... FOR UPDATE can be used with analytic functions, as suggested by David Aldridge below. Here's the test script I used to prove this works.
SET serveroutput ON;
CREATE TABLE materials (
material_id NUMBER(10,0),
cost NUMBER(10,2)
);
ALTER TABLE materials ADD PRIMARY KEY (material_id);
INSERT INTO materials VALUES (1,10);
INSERT INTO materials VALUES (2,30);
INSERT INTO materials VALUES (3,90);
<<LOCAL>>
DECLARE
l_material_id materials.material_id%TYPE;
l_cost materials.cost%TYPE;
l_total_cost materials.cost%TYPE;
CURSOR test IS
SELECT material_id,
cost,
Sum(cost) OVER () total_cost
FROM materials
WHERE material_id BETWEEN 1 AND 3
FOR UPDATE OF cost;
BEGIN
OPEN test;
FETCH test INTO l_material_id, l_cost, l_total_cost;
Dbms_Output.put_line(l_material_id||' '||l_cost||' '||l_total_cost);
FETCH test INTO l_material_id, l_cost, l_total_cost;
Dbms_Output.put_line(l_material_id||' '||l_cost||' '||l_total_cost);
FETCH test INTO l_material_id, l_cost, l_total_cost;
Dbms_Output.put_line(l_material_id||' '||l_cost||' '||l_total_cost);
END LOCAL;
/
Which gives the output:
1 10 130
2 30 130
3 90 130
The syntax select . . . for update locks records in a table to prepare for an update. When you do an aggregation, the result set no longer refers to the original rows.
In other words, there are no records in the database to update. There is just a temporary result set.
You might try something like:
<<LOCAL>>
declare
material_id materials.material_id%Type;
cost materials.cost%Type;
total_cost materials.cost%Type;
begin
select material_id,
cost,
sum(cost) over () total_cost
into local.material_id,
local.cost,
local.total_cost
from materials
where material_id between 1 and 3
for update of cost;
...
end local;
The first row gives you the total cost, but it selects all the rows and in theory they could be locked.
I don't know if this is allowed, mind you -- be interesting to hear whether it is.
For example, there is product table with id, name and stock as shown below.
product table:
id
name
stock
1
Apple
3
2
Orange
5
3
Lemon
8
Then, both 2 queries below can run sum() and SELECT FOR UPDATE together:
SELECT sum(stock) FROM (SELECT * FROM product FOR UPDATE) AS result;
WITH result AS (SELECT * FROM product FOR UPDATE) SELECT sum(stock) FROM result;
Output:
sum
-----
16
(1 row)
For that, you can use the WITH command.
Exemple:
WITH result AS (
-- your select
) SELECT * FROM result GROUP BY material_id;
Is your problem "However, in theory someone could update the cost column on the materials table between the two queries, in which case the calculated percents will be off."?
In that case , probably you can simply use a inner query as:
SELECT material_id, cost/(SELECT Sum(cost)
FROM materials
WHERE material_id >=0
AND material_id <= 10)
INTO v_material_id_collection, v_pct_collection
FROM materials
WHERE material_id >=0
AND material_id <= 10;
Why do you want to lock a table? Other applications might fail if they try to update that table during that time right?
I am sorry if the term m:n is not correct, If you know a better term i will correct. I have the following situation, this is my original data:
gameID
participID
result
the data itself looks like that
1 5 10
1 4 -10
2 5 150
2 2 -100
2 1 -50
when i would extract this table it will easily have some 100mio rows and around 1mio participIDs ore more.
i will need:
show me all results of all games from participant x, where participant y was present
luckily only for a very limited amount of participants, but those are subject to change so I need a complete table and can reduce in a second step.
my idea is the following, it just looks very unoptimized
1) get the list of games where the "point of view participant" is included"
insert into consolidatedtable (gameid, participid, result)
select gameID,participID,sum(result) from mastertable where participID=x and result<>0
2) get all games where other participant is included
insert into consolidatedtable (gameid, participid, result)
where gameID in (select gameID from consolidatedtable)
AND participID=y and result<>0
3) delete all games from consolidate table where count<2
delete from consolidatedDB where gameID in (select gameid from consolidatedtable where count(distinct(participID)<2 group by gameid)
the whole thing looks like a childrens solution to me
I need a consolidated table for each player
I insert way to many games into this table and delete them later on
the whole thing needs to be run participant by participant over
the whole master table, it would not work if i do this for several
participants at the same time
any better ideas, must be, this ones just so bad. the master table will be postgreSQL on the DW server, the consolidated view will be mySQL (but the number crunching will be done in postgreSQL)
my problems
1) how do i build the consolidated table(s - do i need more than one), without having to run a single query for each player over the whole master table (i need to data for players x,y,z and no matter who else is playing) - this is the consolidation task for the DW server, it should create the table for webserver (which is condensed)
2) how can i then extract the at the webserver fast (so the table design of (1) should take this into consideration. we are not talking about a lot of players here i need this info, maybe 100? (so i could then either partition by player ID, or just create single table)
Datawarehouse: postgreSQL 9.2 (48GB, SSD)
Webserver: mySQL 5.5 (4GB Ram, SSD)
master table: gameid BIGINT, participID, Result INT, foreign key on particiP ID (to participants table)
the DW server will hold the master table, the DW server should also prepare the consolidated/extracted Tables (processing power, ssd space is not
an issue)
the webserver should hold the consoldiated tables (only for the 100
players where i need the info) and query this data in a very
efficient manner
so efficient query at webserver >> workload of DW server)
i think this is important, sorry that i didnt include it at the beginning.
the data at the DW server updates daily, but i do not need to query the whole "master table" completely every day. the setup allows me to consolidate only never values. eg: yesterday consolidation was up to ID 500, current ID=550, so today i only consolidate 501-550.
Here is another idea that might work, depending on your database (and my understanding of the question):
SELECT *
FROM table a
WHERE participID = 'x'
AND EXISTS (
SELECT 1 FROM table b
WHERE b.participID = 'y'
AND b.gameID=a.gameID
);
Assuming you have indexes on the two columns (participID and gameID), the performance should be good.
I'd compare it to this and see which runs faster:
SELECT *
FROM table a
JOIN (
SELECT gameID
FROM table
WHERE participID = 'y'
GROUP BY gameID
) b
ON a.gameID=b.gameID
WHERE a.participID = 'x';
Sounds like you just want a self join:
For all participants:
SELECT x.gameID, x.participID, x.results, y.participID, y.results
FROM table as x
JOIN table as y
ON T1.gameID = T2.gameID
WHERE x.participID <> y.participID
The downside of that is you'd get each participant on each side of each game.
For 2 specific particpants:
SELECT x.gameID, x.results, y.results
FROM (SELECT gameID, participID, results
FROM table
WHERE t1.participID = 'x'
and results <> 0)
as x
JOIN (SELECT gameID, participID, results
FROM table
WHERE t1.participID = 'y'
and results <> 0)
as y
ON T1.gameID = T2.gameID
You might not need to select participID in your query, depending on what you're doing with the results.
Disclaimer: my SQL skills are basic, to say the least.
Let's say I have two similar data types in different tables of the same database.
The first table is called hardback and the fields are as follows:
hbID | hbTitle | hbPublisherID | hbPublishDate
The second table is called paperback and its fields hold similar data but the fields are named differently:
pbID | pbTitle | pbPublisherID | pbPublishDate
I need to retrieve the 10 most recent hardback and paperback books, where the publisher ID is 7.
This is what I have so far:
SELECT TOP 10
hbID, hbTitle, hbPublisherID, hbPublishDate AS pDate
bpID, pbTitle, bpPublisherID, pbPublishDate AS pDate
FROM hardback CROSS JOIN paperback
WHERE (hbPublisherID = 7) OR (pbPublisherID = 7)
ORDER BY pDate DESC
This returns seven columns per row, at least three of which may or may not be for the wrong publisher. Possibly four, depending on the contents of pDate, which is almost certainly going to be a problem if the other six columns are for the correct publisher!
In an effort to release an earlier version of this software, I ran two separate queries fetching 10 records each, then sorted them by date and discarded the bottom ten, but I just know there must be a more elegant way to do it!
Any suggestions?
Aside: I was reviewing what I'd written here, when my Mac suddenly experienced a kernel panic. Restarted, reopened my tabs and everything I'd typed was still here! Stack Exchange sites are awesome :)
The easiest way is probably a UNION:
SELECT TOP 10 * FROM
(SELECT hbID, hbTitle, hbPublisherID as PublisherID, hbPublishDate as pDate
FROM hardback
UNION
SELECT hpID, hpTitle, hpPublisher, hpPublishDate
FROM paperback
) books
WHERE PublisherID = 7
If you could have two copies of the same title (1 paperback, 1 hardcover), change the UNION to a UNION ALL; UNION alone discards duplicates. You could also add a column that indicates what book type it is by adding a pseudo-column to each select (after the publish date, for instance):
hbPublishDate as pDate, 'H' as Covertype
You'll have to add the same new column to the paperback half of the query, using 'P' instead. Note that on the second query you don't have to specify column names; the resultset takes the names from the first one. All column data types in the two queries have match, also - you can't UNION a date column in the first with a numeric column in the second without converting the two columns to the same datatype in the query.
Here's a sample script for creating two tables and doing the select above. It works just fine in SQL Server Management Studio.Just remember to drop the two tables (using DROP Table tablename) when you're done.
use tempdb;
create table Paperback (pbID Integer Identity,
pbTitle nvarchar(30), pbPublisherID Integer, pbPubDate Date);
create table Hardback (hbID Integer Identity,
hbTitle nvarchar(30), hbPublisherID Integer, hbPubDate Date);
insert into Paperback (pbTitle, pbPublisherID, pbPubDate)
values ('Test title 1', 1, GETDATE());
insert into Hardback (hbTitle, hbPublisherID, hbPubDate)
values ('Test title 1', 1, GETDATE());
select * from (
select pbID, pbTitle, pbPublisherID, pbPubDate, 'P' as Covertype
from Paperback
union all
select hbID, hbTitle, hbPublisherID, hbPubDate,'H'
from Hardback) books
order by CoverType;
/* You'd drop the two tables here with
DROP table Paperback;
DROP table HardBack;
*/
i think it is clearly better, if you make only one table with a reference to another one which holds information about the category of the entry like hardback or paperback. this is my first suggestion.
by the way, what is your programming language?
I have a problem in sql where I need to generate a packing list from a list of transactions.
Data Model
The transactions are stored in a table that contains:
transaction id
item id
item quantity
Each transaction can have multiple items (and coincidentally multiple rows with the same transaction id). Each item then has a quantity from 1 to N.
Business Problem
The business requires that we create a packing list, where each line item in the packing list contains the count of each item in the box.
Each box can only contain 160 items (they all happen to be the same size/weight). Based on the total count of the order we need to split items into different boxes (sometimes splitting even the individual item's collection into two boxes)
So the challenge is to take that data schema and come up with the result set that includes how many of each item belong in each box.
I am currently brute forcing this in some not so pretty ways and wondering if anyone has an elegant/simple solution that I've overlooked.
Example In/Out
We really need to isolate how many of each item end up in each box...for example:
Order 1:
100 of item A100 of item B140 of item C
This should result in three rows in the result set:
Box 1: A (100), B (60) Box 2: B(40), C (120) Box 3: C(20)
Ideally the query would be smart enough to put all of C together, but at this point - we're not too concerned with that.
How about something like
SELECT SUM([Item quantity]) as totalItems
, SUM([Item quantity]) / 160 as totalBoxes
, MOD(SUM([Item Quantity), 160) amountInLastBox
FROM [Transactions]
GROUP BY [Transaction Id]
Let me know what fields in the resultset you're looking for and I could come up with a better one
I was looking for something similar and all I could achieve was expanding the rows to the number of item counts in a transaction, and grouping them into bins. Not very elegant though.. Moreover, because string aggregation is still very cumbersome in SQL Server (Oracle, i miss you!), I have to leave the last part out. I mean putting the counts in one single row..
My solution is as follows:
Example transactions table:
INSERT INTO transactions
(trans_id, item, cnt) VALUES
('1','A','50'),
('2','A','140'),
('3','B','100'),
('4','C','80');
GO
Create a dummy sequence table, which contains numbers from 1 to 1000 (I assume that maximum number allowed for an item in a single transaction is 1000):
CREATE TABLE numseq (n INT NOT NULL IDENTITY) ;
GO
INSERT numseq DEFAULT VALUES ;
WHILE SCOPE_IDENTITY() < 1000 INSERT numseq DEFAULT VALUES ;
GO
Now we can generate a temporary table from transactions table, in which each transaction and item exist "cnt" times in a subquery, and then give numbers to the bins using division, and group by bin number:
SELECT bin_nr, item, count(*) count_in_bin
INTO result
FROM (
SELECT t.item, ((row_number() over (order by t.item, s.n) - 1) / 160) + 1 as bin_nr
FROM transactions t
INNER JOIN numseq s
ON t.cnt >= s.n -- join conditionally to repeat transaction rows "cnt" times
) a
GROUP BY bin_id, item
ORDER BY bin_id, item
GO
Result is:
bin_id item count_in_bin
1 A 160
2 A 30
2 B 100
2 C 30
3 C 50
In Oracle, the last step would be as simple as that:
SELECT bin_id, WM_CONCAT(CONCAT(item,'(',count_in_bin,')')) contents
FROM result
GROUP BY bin_id
This isn't the prettiest answer but I am using a similar method to keep track of stock items through an order process, and it is easy to understand, and may lead to you developing a better method than I have.
I would create a table called "PackedItem" or something similar. The columns would be:
packed_item_id (int) - Primary Key, Identity column
trans_id (int)
item_id (int)
box_number (int)
Each record in this table represents 1 physical unit you will ship.
Lets say someone adds a line to transaction 4 with 20 of item 12, I would add 20 records to the PackedItem table, all with the transaction ID, the Item ID, and a NULL box number. If a line is updated, you need to add or remove records from the PackedItem table so that there is always a 1:1 correlation.
When the time comes to ship, you can simply
SELECT TOP 160 FROM PackedItem WHERE trans_id = 4 AND box_number IS NULL
and set the box_number on those records to the next available box number, until no records remain where the box_number is NULL. This is possible using one fairly complicated UPDATE statement inside a WHILE loop - which I don't have the time to construct fully.
You can now easily get your desired packing list by querying this table as follows:
SELECT box_number, item_id, COUNT(*) AS Qty
FROM PackedItem
WHERE trans_id = 4
GROUP BY box_number, item_id
Advantages - easy to understand, fairly easy to implement.
Pitfalls - if the table gets out of sync with the lines on the Transaction, the final result can be wrong; This table will get many records in it and will be extra work for the server. Will need each ID field to be indexed to keep performance good.
My memory is failing me. I have a simple audit log table based on a trigger:
ID int (identity, PK)
CustomerID int
Name varchar(255)
Address varchar(255)
AuditDateTime datetime
AuditCode char(1)
It has data like this:
ID CustomerID Name Address AuditDateTime AuditCode
1 123 Bob 123 Internet Way 2009-07-17 13:18:06.353I
2 123 Bob 123 Internet Way 2009-07-17 13:19:02.117D
3 123 Jerry 123 Internet Way 2009-07-17 13:36:03.517I
4 123 Bob 123 My Edited Way 2009-07-17 13:36:08.050U
5 100 Arnold 100 SkyNet Way 2009-07-17 13:36:18.607I
6 100 Nicky 100 Star Way 2009-07-17 13:36:25.920U
7 110 Blondie 110 Another Way 2009-07-17 13:36:42.313I
8 113 Sally 113 Yet another Way 2009-07-17 13:36:57.627I
What would be the efficient select statement be to get all most current records between a start and end time? FYI: I for insert, D for delete, and U for update.
Am I missing anything in the audit table? My next step is to create an audit table that only records changes, yet you can extract the most recent records for the given time frame. For the life of me I cannot find it on any search engine easily. Links would work too. Thanks for the help.
Another (better?) method to keep audit history is to use a 'startDate' and 'endDate' column rather than an auditDateTime and AuditCode column. This is often the approach in tracking Type 2 changes (new versions of a row) in data warehouses.
This lets you more directly select the current rows (WHERE endDate is NULL), and you will not need to treat updates differently than inserts or deletes. You simply have three cases:
Insert: copy the full row along with a start date and NULL end date
Delete: set the End Date of the existing current row (endDate is NULL)
Update: do a Delete then Insert
Your select would simply be:
select * from AuditTable where endDate is NULL
Anyway, here's my query for your existing schema:
declare #from datetime
declare #to datetime
select b.* from (
select
customerId
max(auditdatetime) 'auditDateTime'
from
AuditTable
where
auditcode in ('I', 'U')
and auditdatetime between #from and #to
group by customerId
having
/* rely on "current" being defined as INSERTS > DELETES */
sum(case when auditcode = 'I' then 1 else 0 end) >
sum(case when auditcode = 'D' then 1 else 0 end)
) a
cross apply(
select top 1 customerId, name, address, auditdateTime
from AuditTable
where auditdatetime = a.auditdatetime and customerId = a.customerId
) b
References
A cribsheet for data warehouses, but has a good section on type 2 changes (what you want to track)
MSDN page on data warehousing
Ok, a couple of things for audit log tables.
For most applications, we want audit tables to be extremely quick on insertion.
If the audit log is truly for diagnostic or for very irregular audit reasons, then the quickest insertion criteria is to make the table physically ordered upon insertion time.
And this means to put the audit time as the first column of the clustered index, e.g.
create unique clustered index idx_mytable on mytable(AuditDateTime, ID)
This will allow for extremely efficient select queries upon AuditDateTime O(log n), and O(1) insertions.
If you wish to look up your audit table on a per CustomerID basis, then you will need to compromise.
You may add a nonclustered index upon (CustomerID, AuditDateTime), which will allow for O(log n) lookup of per-customer audit history, however the cost will be the maintenance of that nonclustered index upon insertion - that maintenance will be O(log n) conversely.
However that insertion time penalty may be preferable to the table scan (that is, O(n) time complexity cost) that you will need to pay if you don't have an index on CustomerID and this is a regular query that is performed.
An O(n) lookup which locks the table for the writing process for an irregular query may block up writers, so it is sometimes in writers' interests to be slightly slower if it guarantees that readers aren't going to be blocking their commits, because readers need to table scan because of a lack of a good index to support them....
Addition: if you are looking to restrict to a given timeframe, the most important thing first of all is the index upon AuditDateTime. And make it clustered as you are inserting in AuditDateTime order. This is the biggest thing you can do to make your query efficient from the start.
Next, if you are looking for the most recent update for all CustomerID's within a given timespan, well thereafter a full scan of the data, restricted by insertion date, is required.
You will need to do a subquery upon your audit table, between the range,
select CustomerID, max(AuditDateTime) MaxAuditDateTime
from AuditTrail
where AuditDateTime >= #begin and Audit DateTime <= #end
and then incorporate that into your select query proper, eg.
select AuditTrail.* from AuditTrail
inner join
(select CustomerID, max(AuditDateTime) MaxAuditDateTime
from AuditTrail
where AuditDateTime >= #begin and Audit DateTime <= #end
) filtration
on filtration.CustomerID = AuditTrail.CustomerID and
filtration.AuditDateTime = AuditTrail.AuditDateTime
Another approach is using a sub select
select a.ID
, a.CustomerID
, a.Name
, a.Address
, a.AuditDateTime
, a.AuditCode
from myauditlogtable a,
(select s.id as maxid,max(s.AuditDateTime)
from myauditlogtable as s
group by maxid)
as subq
where subq.maxid=a.id;
start and end time? e.g as in between 1am to 3am
or start and end date time? e.g as in 2009-07-17 13:36 to 2009-07-18 13:36