Definition of a quirky update - sql

Based on Itzik Ben-Gan's article in ITProToday
Microsoft’s implementation follows the physical data independence
principle, and therefore does not guarantee that you will get the data
back from a query in any particular order unless you add an ORDER BY
clause in the outer query. A similar violation of the principle is
when people update data and the solution’s correctness relies on the
data being updated in clustered index order (do a Web search on
“quirky update” to see what I mean).
I tried to find what a quirky update means but in vain.
I am looking for an example to understand the concept.

Here's an example of the "Quirky Update"
use tempdb
go
drop table if exists t
go
create table t(id int primary key, Amount int, RunningTotal int)
insert into t(id,Amount,RunningTotal) values (1,4,0),(2,2,0),(3,6,0)
declare #t int = 0
update t set #t = RunningTotal = #t + Amount
select * from t
outputs
id Amount RunningTotal
----------- ----------- ------------
1 4 4
2 2 6
3 6 12

Related

Get all missing values between two limits in SQL table column

I am trying to assign ID numbers to records that are being inserted into an SQL Server 2005 database table. Since these records can be deleted, I would like these records to be assigned the first available ID in the table. For example, if I have the table below, I would like the next record to be entered at ID 4 as it is the first available.
| ID | Data |
| 1 | ... |
| 2 | ... |
| 3 | ... |
| 5 | ... |
The way that I would prefer this to be done is to build up a list of available ID's via an SQL query. From there, I can do all the checks within the code of my application.
So, in summary, I would like an SQL query that retrieves all available ID's between 1 and 99999 from a specific table column.
First build a table of all N IDs.
declare #allPossibleIds table (id integer)
declare #currentId integer
select #currentId = 1
while #currentId < 1000000
begin
insert into #allPossibleIds
select #currentId
select #currentId = #currentId+1
end
Then, left join that table to your real table. You can select MIN if you want, or you could limit your allPossibleIDs to be less than the max table id
select a.id
from #allPossibleIds a
left outer join YourTable t
on a.id = t.Id
where t.id is null
Don't go for identity,
Let me give you an easy option while i work on a proper one.
Store int from 1-999999 in a table say Insert_sequence.
try to write an Sp for insertion,
You can easly identify the min value that is present in your Insert_sequence and not in
your main table, store this value in a variable and insert the row with ID from variable..
Regards
Ashutosh Arya
You could also loop through the keys. And when you hit an empty one Select it and exit Loop.
DECLARE #intStart INT, #loop bit
SET #intStart = 1
SET #loop = 1
WHILE (#loop = 1)
BEGIN
IF NOT EXISTS(SELECT [Key] FROM [Table] Where [Key] = #intStart)
BEGIN
SELECT #intStart as 'FreeKey'
SET #loop = 0
END
SET #intStart = #intStart + 1
END
GO
From there you can use the key as you please. Setting a #intStop to limit the loop field would be no problem.
Why do you need a table from 1..999999 all information you need is in your source table. Here is a query which give you minimal ID to insert in gaps.
It works for all combinations:
(2,3,4,5) - > 1
(1,2,3,5) - > 4
(1,2,3,4) - > 5
SQLFiddle demo
select min(t1.id)+1 from
(
select id from t
union
select 0
)
t1
left join t as t2 on t1.id=t2.id-1
where t2.id is null
Many people use an auto-incrementing integer or long value for the Primary Key of their tables, and it is often called ID or MyEntityID or something similar. This column, since it's just an auto-incrementing integer, often has nothing to do with the data being stored itself.
These types of "primary keys" are called surrogate keys. They have no meaning. Many people like these types of IDs to be sequential because it is "aesthetically pleasing", but this is a waste of time and resources. The database could care less about which IDs are being used and which are not.
I would highly suggest you forget trying to do this and just leave the ID column auto-increment. You should also create an index on your table that is made up of those (subset of) columns that can uniquely identify each record in the table (and even consider using this index as your primary key index). In rare cases where you would need to use all columns to accomplish that, that is where an auto-incrementing primary key ID is extremely useful—because it may not be performant to create an index over all columns in the table. Even so, the database engine could care less about this ID (e.g. which ones are in use, are not in use, etc.).
Also consider that an integer-based ID has a maximum total of 4.2 BILLION IDs. It is quite unlikely that you'll exhaust the supply of integer-based IDs in any short amount of time, which further bolsters the argument for why this sort of thing is a waste of time and resources.

Process SQL Table with no Unique Column

We have a table which keeps the log of internet usage inside our company. this table is filled by a software bought by us and we cannot make any changes to its table. This table does not have a unique key or index (to make the data writing faster as its developers say)
I need to read the data in this table to create real time reports of internet usage by our users.
currently I'm reading data from this table in chunks of 1000 records. My problem is keeping the last record I have read from the table, so I can read the next 1000 records.
what is the best possible solution to this problem?
by the way, earlier records may get deleted by the software as needed if the database file size gets big.
Depending on your version of SQL Server, you can use row_number(). Once the row_number() is assigned, then you can page through the records:
select *
from
(
select *,
row_number() over(order by id) rn
from yourtable
) src
where rn between 1 and 1000
Then when you want to get the next set of records, you could change the values in the WHERE clause to:
where rn between 1001 and 2000
Based on your comment that the data gets deleted, I would do the following.
First, insert the data into a temptable:
select *, row_number() over(order by id) rn
into #temp
from yourtable
Then you can select the data by row number in any block as needed.
select *
from #temp
where rn between 1 and 1000
This would also help;
declare #numRecords int = 1000 --Number of records needed per request
declare #requestCount int = 0 --Request number starting from 0 and increase 1 by 1
select top (#numRecords) *
from
(
select *, row_number() over(order by id) rn
from yourtable
) T
where rn > #requestCount*#numRecords
EDIT: As per comments
CREATE PROCEDURE [dbo].[select_myrecords]
--Number of records needed per request
declare #NumRecords int --(= 1000 )
--Datetime of the LAST RECORD of previous result-set or null for first request
declare #LastDateTime datetime = null
AS
BEGIN
select top (#NumRecords) *
from yourtable
where LOGTime < isnull(#LastDateTime,getdate())
order by LOGTime desc
END
Without any index you cannot efficiently select the "last" records. The solution will not scale. You cannot use "real-time" and "repeated table scans of a big logging table" in the same sentence.
Actually, without any unique identification attribute for each row you cannot even determine what's new (proof: say, you had a table full of thousands of booleans. How would you determine which ones are new? They cannot be told apart! You cannot find out.). There must be something you can use, like a combination of DateTime, IP or so. Or, you can add an IDENTITY column which is likely to be transparent to the software you use.
Probably, the software you use will tolerate you creating an index on some ID or DateTime column as this is transparent to the software. It might create more load, so be sure to test it (my guess: you'll be fine).

Simulating an identity column within an insert trigger

I have a table for logging that needs a log ID but I can't use an identity column because the log ID is part of a combo key.
create table StuffLogs
{
StuffID int
LogID int
Note varchar(255)
}
There is a combo key for StuffID & LogID.
I want to build an insert trigger that computes the next LogID when inserting log records. I can do it for one record at a time (see below to see how LogID is computed), but that's not really effective, and I'm hoping there's a way to do this without cursors.
select #NextLogID = isnull(max(LogID),0)+1
from StuffLogs where StuffID = (select StuffID from inserted)
The net result should allow me to insert any number of records into StuffLogs with the LogID column auto computed.
StuffID LogID Note
123 1 foo
123 2 bar
456 1 boo
789 1 hoo
Inserting another record using StuffID: 123, Note: bop will result in the following record:
StuffID LogID Note
123 3 bop
Unless there is a rigid business reason that requires each LogID to be a sequence starting from 1 for each distinct StuffID, then just use an identity. With an identity, you'll still be able to order rows properly with StuffID+LogID, but you'll not have the insert issues of trying to do it manually (concurrency, deadlocks, locking/blocking, slow inserts, etc.).
Make sure the LogId has a default value of NULL, so that it need not be supplied during insert statements, like it was an identity column.
CREATE TRIGGER Insert ON dbo.StuffLogs
INSTEAD OF INSERT
AS
UPDATE #Inserted SET LogId = select max(LogId)+1 from StuffLogs where StuffId=[INSERTED].StuffId
Select Row_Number() Over( Order By LogId ) + MaxValue.LogId + 1
From inserted
Cross Join ( Select Max(LogId) As Id From StuffLogs ) As MaxValue
You would need to thoroughly test this and ensure that if two connections were inserting into the table at the same time that you do not get collisions on LogId.

SQL standard select current records from an audit log question

My memory is failing me. I have a simple audit log table based on a trigger:
ID int (identity, PK)
CustomerID int
Name varchar(255)
Address varchar(255)
AuditDateTime datetime
AuditCode char(1)
It has data like this:
ID CustomerID Name Address AuditDateTime AuditCode
1 123 Bob 123 Internet Way 2009-07-17 13:18:06.353I
2 123 Bob 123 Internet Way 2009-07-17 13:19:02.117D
3 123 Jerry 123 Internet Way 2009-07-17 13:36:03.517I
4 123 Bob 123 My Edited Way 2009-07-17 13:36:08.050U
5 100 Arnold 100 SkyNet Way 2009-07-17 13:36:18.607I
6 100 Nicky 100 Star Way 2009-07-17 13:36:25.920U
7 110 Blondie 110 Another Way 2009-07-17 13:36:42.313I
8 113 Sally 113 Yet another Way 2009-07-17 13:36:57.627I
What would be the efficient select statement be to get all most current records between a start and end time? FYI: I for insert, D for delete, and U for update.
Am I missing anything in the audit table? My next step is to create an audit table that only records changes, yet you can extract the most recent records for the given time frame. For the life of me I cannot find it on any search engine easily. Links would work too. Thanks for the help.
Another (better?) method to keep audit history is to use a 'startDate' and 'endDate' column rather than an auditDateTime and AuditCode column. This is often the approach in tracking Type 2 changes (new versions of a row) in data warehouses.
This lets you more directly select the current rows (WHERE endDate is NULL), and you will not need to treat updates differently than inserts or deletes. You simply have three cases:
Insert: copy the full row along with a start date and NULL end date
Delete: set the End Date of the existing current row (endDate is NULL)
Update: do a Delete then Insert
Your select would simply be:
select * from AuditTable where endDate is NULL
Anyway, here's my query for your existing schema:
declare #from datetime
declare #to datetime
select b.* from (
select
customerId
max(auditdatetime) 'auditDateTime'
from
AuditTable
where
auditcode in ('I', 'U')
and auditdatetime between #from and #to
group by customerId
having
/* rely on "current" being defined as INSERTS > DELETES */
sum(case when auditcode = 'I' then 1 else 0 end) >
sum(case when auditcode = 'D' then 1 else 0 end)
) a
cross apply(
select top 1 customerId, name, address, auditdateTime
from AuditTable
where auditdatetime = a.auditdatetime and customerId = a.customerId
) b
References
A cribsheet for data warehouses, but has a good section on type 2 changes (what you want to track)
MSDN page on data warehousing
Ok, a couple of things for audit log tables.
For most applications, we want audit tables to be extremely quick on insertion.
If the audit log is truly for diagnostic or for very irregular audit reasons, then the quickest insertion criteria is to make the table physically ordered upon insertion time.
And this means to put the audit time as the first column of the clustered index, e.g.
create unique clustered index idx_mytable on mytable(AuditDateTime, ID)
This will allow for extremely efficient select queries upon AuditDateTime O(log n), and O(1) insertions.
If you wish to look up your audit table on a per CustomerID basis, then you will need to compromise.
You may add a nonclustered index upon (CustomerID, AuditDateTime), which will allow for O(log n) lookup of per-customer audit history, however the cost will be the maintenance of that nonclustered index upon insertion - that maintenance will be O(log n) conversely.
However that insertion time penalty may be preferable to the table scan (that is, O(n) time complexity cost) that you will need to pay if you don't have an index on CustomerID and this is a regular query that is performed.
An O(n) lookup which locks the table for the writing process for an irregular query may block up writers, so it is sometimes in writers' interests to be slightly slower if it guarantees that readers aren't going to be blocking their commits, because readers need to table scan because of a lack of a good index to support them....
Addition: if you are looking to restrict to a given timeframe, the most important thing first of all is the index upon AuditDateTime. And make it clustered as you are inserting in AuditDateTime order. This is the biggest thing you can do to make your query efficient from the start.
Next, if you are looking for the most recent update for all CustomerID's within a given timespan, well thereafter a full scan of the data, restricted by insertion date, is required.
You will need to do a subquery upon your audit table, between the range,
select CustomerID, max(AuditDateTime) MaxAuditDateTime
from AuditTrail
where AuditDateTime >= #begin and Audit DateTime <= #end
and then incorporate that into your select query proper, eg.
select AuditTrail.* from AuditTrail
inner join
(select CustomerID, max(AuditDateTime) MaxAuditDateTime
from AuditTrail
where AuditDateTime >= #begin and Audit DateTime <= #end
) filtration
on filtration.CustomerID = AuditTrail.CustomerID and
filtration.AuditDateTime = AuditTrail.AuditDateTime
Another approach is using a sub select
select a.ID
, a.CustomerID
, a.Name
, a.Address
, a.AuditDateTime
, a.AuditCode
from myauditlogtable a,
(select s.id as maxid,max(s.AuditDateTime)
from myauditlogtable as s
group by maxid)
as subq
where subq.maxid=a.id;
start and end time? e.g as in between 1am to 3am
or start and end date time? e.g as in 2009-07-17 13:36 to 2009-07-18 13:36

Is it possible INSERT SELECT a collection of aggregate values, then Update another table based on the ##IDENTITY values of the inserts you just made?

I'll begin by admitting that my problem is most likely the result of bad design since I can't find anything about this elsewhere. That said, let's get dirty.
I have an Activities table and an ActivitySegments table. The activities table looks something like:
activityid (ident) | actdate (datetime) | actduration (datetime) | ticketnumber (numeric) |
ActivitySegments looks something like
segmentid (ident) | ticketid (numeric) | activityid (numeric) | startdate | starttime | enddate | endtime
This is a time tracking function of an intranet. The "old way" of doing things is just using the activity table. They want to be able to track individual segments of work throughout the day with a start/stop mechanism and have them roll up into records in the activities table. The use case is the user should be able to select any/all segments that they've worked on that day and have them be grouped by ticketid and inserted into the activity table. I have that working. I'm sending a string of comma separated values that correspond to segmentids to a sproc that puts them in a temp table. So I have the above two tables and a temp table with one column of relevant segmentids. Can't they all just get along?
What I need is to take these passed activitysegment Ids, group them by ticket number and sum the duration worked on each ticket (I already have the sql for that). Then insert this dataset into the activities table BUT also get the new activityid ##identity and update the activitiessegments table with the appropriate value.
In procedural programming I'd for loop the insert, get the ##identity and do something else to figure out which segmentids went into creating that activityid. I'm pretty sure I'm thinking about this all wrong, but the deadline approaches and I've been staring at SQL management studio for two days, wasted sheets of paper and burned through way too many cigarettes. I see SQL for Smarties in my near future, until then, can someone help me?
try this approach:
declare #x table (tableID int not null primary key identity (1,1), datavalue varchar(10) null)
INSERT INTO #x values ('one')
INSERT INTO #x values ('aaaa')
INSERT INTO #x values ('cccc')
declare #y table (tableID int not null primary key , datavalue varchar(10) null)
declare #count int ---------------FROM HERE, see comment
set #count=5;
WITH hier(cnt) AS
(
SELECT 1 AS cnt
UNION ALL
SELECT cnt + 1
FROM hier
WHERE cnt < #count
) -----------------------To HERE, see comment
INSERT INTO #x
(datavalue)
OUTPUT INSERTED.tableID, INSERTED.datavalue
INTO #y
SELECT
'value='+CONVERT(varchar(5),h.cnt)
FROM hier h
ORDER BY cnt DESC
select '#x',* from #x --table you just inserted into
select '#y',* from #y --captured data, including identity
here is output of the SELECTs
tableID datavalue
---- ----------- ----------
#x 1 one
#x 2 aaaa
#x 3 cccc
#x 4 value=5
#x 5 value=4
#x 6 value=3
#x 7 value=2
#x 8 value=1
(8 row(s) affected)
tableID datavalue
---- ----------- ----------
#y 4 value=5
#y 5 value=4
#y 6 value=3
#y 7 value=2
#y 8 value=1
The "FROM HERE" - "TO HERE" is just a fancy way to create a table to join to, you can use your own table to join to there...
use #y to process your updates, update from and join it in...
It can be tricky to implement the case of (1) do an insert of ranges and (2) use the identity values generated by them.
One approach is to have some kind of tracking column on the table that generates the Id. So for example add a TransactionGUID (uniqueidentifier) or something to the table that generates the identity you want to capture. When you do insert the rowset to this table you specify a given GUID and can then harvest the set of identity values after the insert completes.
The other common approach is just to to it iteratively like you mentioned.
Probably there is a better way to architect what you want to do, but if you must use your current approach (and if I understand correctly what it is you are doing) then adding the TransactionGUID may be the easiest fix.