I am trying to maintain a ticketing system to keep track of Work Order numbers and every so often the count of the WO_NUM jumps. The WO_NUM should be the same as the WOID. But for some reason, after using this system for years, the WO_NUM started to be 1 greater than the WOID.
The count for WO_NUM jumps by 2 instead of 1 from WO_NUM 229912 to WO_NUM 229914
Then a few months later (128 days) to be exact, it jumps again this time to 2 greater than the WOID.
The count for WO_NUM jumps from 239946 to 239948
This happens again 18 days later, but this time to 3 greater than WOID with WO_NUM jumping from 241283 to 241285 while WOID increments normally from 241281 to 241822
And again 7 days later to 4 greater than WOID with WO_NUM jumps from 241897 to 241899 while WOID increments normally from 241894 to 241895.
This seems to keep getting further and further off and it his happening almost exponentially quicker. Any idea why this might be/how I might go about fixing it?
If you are using an IDENTITY field in SQL Server or a similar auto-increment mechanism in another system, there is no guarantee that your IDs will be consecutive. If you try to insert a new row and the insert fails, the ID that would have been used is skipped. This is to allow another insert to begin while the other is in process without an ID collision.
If you need (not want) IDs to be consective then you'll have do do something like:
Create a locking mechanism so that inserts are atomic.
Use a key table that will store the next available ID for your table
Only increment the key table if the insert succeeds.
That said, obviously this adds a lot of risk to your system, and doesn't address what happens if you delete a record. I would reconsider whether you need consecutive IDs and whether that feature is worth the extra development and overhead.
figured it out myself. Turns out that if a work order was started and then cancelled without being saved WO_NUM was being incremented and not rolled back. Thanks to all that tried to help, sorry I didn't provide the requisite information. I'll make sure to do better next time!
Related
What I need to do is to write to the same row from two different sources (procedures/methods/services).
The first call that comes in creates the row, and the next one just updates it.
This needs to happen without any locking taking place. And if possible I would like to be able to call either source just once (not repeatedly by dealing with locking errors)
Here is kinda what I have now in a third procedure that the others call and just inserts a row (only inserts into the xyz) or returns true if there is a row.
That way it´s just fast and unlikely that both calls arrive at the same time.
IF EXISTS(SELECT * FROM [dbo].[Wait] WHERE xyx= #xyz)
BEGIN
-- The row exists because the other datasource
-- has allready inserted a row with the same xyz
-- UPDATE THE ROW WITH DATA COMING IN
END
ELSE
BEGIN
-- No row with value xyz exists so we INSERT it with
-- the extra data.
END
I know it does´t guaranty no lock. But in my case it´s actually unlikely that both arrive at the same time and even if they would it´s user controlled so they will get an error and will just try again. BUT I wan´t to solve this.
I have been seeing Row Versioning popping up but I´m not sure if that helps or how I should use it.
Have a look at Michael J Swarts' article Mythbusting: Concurrent Update/Insert Solutions. This will show you all possible do's and don'ts. Including the fact that merge actually doesn't do a great job in solving concurrency issues.
We're troubleshooting a sort of Sync Framework between two SQL Server databases, in separate servers (both SQL Server 2008 Enterprise 64 bits SP2 - 10.0.4000.0), through linked server connections, and we reached to a point in which we're sort of stuck.
The logic to identify which are the records "pending to be synced" is of course based on ROWVERSION values, including the use of MIN_ACTIVE_ROWVERSION() to avoid dirty reads.
All SELECT operations are encapsulated in SPs on each "source" side. This is a schematic sample of one SP:
PROCEDURE LoaderRetrieve(#LastStamp bigint, #Rows int)
BEGIN
...
(vars handling)
...
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
Select TOP (#Rows) Field1, Field2, Field3
FROM Table
WHERE [RowVersion] > #LastStampAsRowVersionDataType
AND [RowVersion] < #MinActiveVersion
Order by [RowVersion]
END
The approach works just fine, we usually sync records with the expected rate of 600k/hour (job every 30 seconds, batch size = 5k), but at some point, the sync process does not find any single record to be transferred, even though there are several thousand of records with a ROWVERSION value greater than the #LastStamp parameter.
When checking the reason, we've found that the MIN_ACTIVE_ROWVERSION() has a value less than (or slightly greater, just 5 or 10 increments) the #LastStamp being searched. This of course shouldn't be a problem since the MIN_ACTIVE_ROWVERSION() approach was introduced to avoid dirty reads and posterior issues, BUT:
The problem we see in some occasions, during the above scenario occurs, is that the value for MIN_ACTIVE_ROWVERSION() does not change during a long (really long) period of time, like 30/40 minutes, sometimes more than one hour. And this value is by far less than the ##DBTS value.
We first thought this was related to a pending DB transaction not yet committed. As per MSDN definition about the MIN_ACTIVE_ROWVERSION() (link):
Returns the lowest active rowversion value in the current database. A rowversion value is active if it is used in a transaction that has not yet been committed.
But when checking sessions (sys.sysprocesses) with open_tran > 0 during the duration of this issue, we couldn't find any session with a waittime greater than a few seconds, only one or two occurrences of +/- 5 minutes waittime sessions.
So at this point we're struggling to understand the situation: The MIN_ACTIVE_ROWVERSION() does not change during a huge period of time, and no uncommitted transactions with long waits are found within this time frame.
I'm not a DBA and could be the case that we're missing something in the picture to analyze this problem, doing some research on forums and blogs couldn't found any other clue. So far the open_tran > 0 was the valid reason, but under the circumstances I've exposed, it's clear that there's something else and don't know why.
Any feedback is appreciated.
well, I finally find the solution after digging a bit more.
The problem is that we were looking for sessions with a long waittime, but the real deal was to find sessions which have an active batch since a while.
If there's a session where open_tran = 1, to obtain exactly since when this transaction is open (and of course still active, not yet committed), the field last_batch from sys.sysprocesses shall be checked.
Using this query:
select
batchDurationMin= DATEDIFF(second,last_batch,getutcdate())/60.0,
batchDurationSecs= DATEDIFF(second,last_batch,getutcdate()),
hostname,open_tran,* from sys.sysprocesses a
where spid > 50
and a.open_tran >0
order by last_batch asc
we could identify a session with an open tran being active 30+ minutes. And with hostname values and some more checks within the web services (and also using dbcc inputbuffer) we found the responsible process.
So, the final question actually is "there's indeed an active session with an uncommitted transaction", therefore the MIN_ACTIVE_ROWVERSION() does not change. We were just looking processes with the wrong criteria.
Now that we know which process behaves like this, next step will be to improve it.
Hope this results useful to someone else.
I have a Ruby on Rails application that uses MySQL and I need to calculate blocks of free (available) time given a table that has rows of start and end datetimes. This needs to be done for a range of dates, so for example, I would need to look for which times are free between May 1 and May 7. I can query the table with the times that are NOT available and use that to remove periods of time between May 1 and May 7. Times in the database are stored at a fidelity of 15 minutes on the quarter hour, meaning all times end at 00, 15, 30 or 45 minutes. There is never a time like 11:16 or 10:01, so no rounding is necessary.
I've thought about creating a hash that has time represented in 15 minute increments and defaulting all of the values to "available" (1), then iterating over an ordered resultset of rows and flipping the values in the hash to 0 for the times that come back from the database. I'm not sure if this is the most efficient way of doing this, and I'm a little concerned about the memory utilization and computational intensity of that approach. This calculation won't happen all the time, but it needs to scale to happening at least a couple hundred times a day. It seems like I would also need to reprocess the entire hash to find the blocks of time that are free after this which seems pretty inefficient.
Any ideas on a better way to do this?
Thanks.
I've done this a couple of ways. First, my assumption is that your table shows appointments, and now you want to get a list of un-booked time, right?
So, the first way I did this was like yours, just a hash of unused times. It's slow and limited and a little wasteful, since I have to re-calculate the hash every time someone needs to know the times that are available.
The next way I did this was borrow an idea from the data warehouse people. I build an attribute table of all time slots that I'm interested in. If you build this kind of table, you may want to put more information in there besides the slot times. You may also include things like whether it's a weekend, which hour of the day it's in, whether it's during regular business hours, whether it's on a holiday, that sort of thing. Then, I have to do a join of all slots between my start and end times and my appointments are null. So, this is a LEFT JOIN, something like:
SELECT *
FROM slots
WHERE ...
LEFT JOIN appointments
WHERE appointments.id IS NULL
That keeps me from having to re-create the hash every time, and it's using the database to do the set operations, something the database is optimized to do.
Also, if you make your slots table a little rich, you can start doing all sorts of queries about not only the available slots you may be after, but also on the kinds of times that tend to get booked, or the kinds of times that tend to always be available, or other interesting questions you might want to answer some day. At the very least, you should keep track of the fields that tell you whether a slot should be one that is being filled or not (like for business hours).
Why not have a flag in the row that indicates this. As time is allocated, flip the flag for every date/time in the appropriate range. For example May 2, 12pm to 1pm, would be marked as not available.
Then it's a simple matter of querying the date range for every row that has the availability flagged set as true.
Given the table:
CREATE TABLE Table1
(
UniqueID int IDENTITY(1,1)
...etc
)
Now why would you ever set the increment to something other than 1?
I can understand setting the initial seed value differently. For example if, say, you're creating one database table per month of data (e.g. Table1_082009, Table1_092009) and want to start the UniqueID of the new table where the old one left off. (I probably wouldn't use that strategy myself, but hey, I can see people doing it).
But for the increment? I can only imagine it being of any use in really odd situations, for example:
after the initial data is inserted, maybe later someone will want to turn identity insert on and insert new rows in the gaps, but for efficient lookup on the index will want the rows to be close to each other?
if you're looking up ids based directly off a URL, and want to make it harder for people to arbitrarily access the other items (for example, instead of the user being able to work out that changing the URL suffix from /GetData?id=1000 to /GetData?id=1001, you set an increment of 437 so that the next url is actually /GetData?id=1437)? Of course if this is your "security" then you're probably already in trouble...
I can't think of anything else. Has anyone used an increment that wasn't 1, and why? I'm really just curious.
One idea might be using this to facilitate partitionnement of data (though there might be more "automated" ways to do that) :
Considering you have two servers :
On one server, you start at 1 and increment by 2
On the other server, you start at 2 and increment by 2.
Then, from your application, your send half inserts to one server, and the other half to the second server
some kind of software load-balancing
This way, you still have the ability to identify your entries : the "UniqueID" is still unique, even if the data is split on two servers / tables.
But that's only a wild idea -- there are probably some other uses to that...
Once, for pure fun, (Oh yeah, we have a wild side to us) decided to negative increment. It was strange to see the numbers grow in size and smaller in value at the same time.
I could hardly sit still in my chair.
edit (afterthought):
You think the creator of the IDENTITY was in love with FOR loops? You know..
for (i = 0; i<=99; i+=17)
or for those non semi-colon folks out there
For i = 0 to 100 step 17
Only for entertainment. And you have to be REALLY bored.
I'm experimenting with a personal finance application, and I'm thinking about what approach to take to update running balances when entering a transaction in an account.
Currently the way I'm using involves retrieving all records more recent than the inserted/modified one, and go one by one incrementing their running balance.
For example, given the following transactions:
t1 date = 2008-10-21, amount = 500, running balance = 1000
t2 date = 2008-10-22, amount = 300, running balance = 1300
t3 date = 2008-10-23, amount = 100, running balance = 1400
...
Now suppose I insert a transaction between t1 and t2, then t2 and all subsequent transactions would need their running balances adjusted.
Hehe, now that I wrote this question, I think I know the answer... so I'll leave it here in case it helps someone else (or maybe there's even a better approach?)
First, I get the running balance from the previous transaction, in this case, t1. Then I update all following transactions (which would include the new one):
UPDATE transactions
SET running_balance = running_balance + <AMOUNT>
WHERE date > <t1.date>
The only issue I see is that now instead of storing only a date, I'll have to store a time too. Although, what would happen if two transactions had the exact same date/time?
PS: I'd prefer solutions not involving propietary features, as I'm using both PostgreSQL and SQLite... Although a Postgre-only solution would be helpful too.
Some sort of Identity / Auto-increment columnn in there would be wise as well, purely for the transaction order if anything.
Also in addition to just the date of the transaction, a date that the transaction is inserted into the database (not always the same) would be wise / helpful as well.
These sort of things simply help you arrange things in the system and make it easier to change things i.e. for transactions, at a later time.
I think this might work:
I was using both the date and the id to order the transactions, but now I'm going to store both the date and the id on one column, and use that for ordering. So, using comparisons (like >) should always work as expected, right? (as opposed to the situation I describe earlier where two columns have the exact datetime (however unlikely that'd be).
If you have a large volume of transactions, then you are better off storing the running balance date-wise or even week/month-wise in a separate table.
This was if you are inserting rows for the same date you just need to change the running balance in one row.
The querying and reporting will be more trickier as using this running balance you would need to arrive at balances after each transaction, it would be more like taking the last days running balance and adding or subtracting the transaction value.