How to find the last number of a sequence - sql

I am using spring batch in my application. I have an upload process in which the data is loaded in DB from an excel file. The excel has five spreadsheets loaded in five different tables. If the upload is successful then no problem. but when the upload fails, say in the 3rd sheet, I am rolling back the entire upload. This causes the sequence in the first table to skip numbers. How can I solve this issue. My sequence is incremented by 1 and has nocache.
Thanks in Advance

To solve this issue, you most probably have to change your expectations. Sequences usually don't roll back when they were used in a transaction. Example:
T1 insert into table A using sequence S to generate PK values
T2 insert into table B using sequence S to generate PK values
Now you roll back T1. What should happen to the PKs values in T2? Should they be renumbered? What if someone (like a Java program) already read the PK values?
You can see that even for pretty simple cases, it's impossible to roll back sequences.
Some databases have the concept of identity columns where the DB server internally assigns keys but even for those, you will have gaps.
If you really need an uninterrupted flow of PKs, you will have to use your own sequence (or a counter in your Java code) and maintain/reset this one yourself.

Related

SEQUENCE number on every INSERT in MS SQL 2012

I am in the situation where multiple user inserting values from application to database via web service, have using stored procedure for validate and insert records.
Requirement is create unique number for each entries but strictly in SEQUENCE only. I added Identity column but its missed some of the number in between e.g. 25,26,27,29,34...
Our requirement is strictly generate next number only like we use for Invoice Number/ Order Number/ Receipt Number etc. 1,2,3,4,5...
I checked below link about Sequence Number but not sure if its surely resolve my issue. Can someone please assist in this.
Sequence Numbers
If you absolutely, positively cannot have gaps, then you need to use a trigger and your own logic. This puts a lot of overhead into inserts, but it is the only guarantee.
Basically, the parts of a database that protect the data get in the way of doing what you want. If a transaction uses a sequence number (or identity) and it is later rolled back, then what happens to the generated number? Well, that is one way that gaps appear.
Of course, you will have to figure out what to do in that case. I would just go for an identity column and work on educating users that gaps are possible. After all, if you don't want gaps on output, then row_number() is available to re-assign numbers.

how to valiate Millions of data?

How to validate the scenario?
Scenario 1:
Source File is Flat File which contains Millions of data.
All the data from the source file is loaded to target table in the Data Base.
Now the question is how to validate if all the data is loaded in the target table correctly??
Note: we can’t use xls to validate as we have Millions of records in it.
There are lots of ways one can validate data. Much of it depends on three things:
How much time do you have for validation?
What are your processing capabilities?
Is the data on a QA or Production SQL server?
If you are in QA and have lots of processing power, you can do basic checks:
Where there any warnings or errors during the data load?
Count the total number of items in the database vs. the raw file
Count the total number of null records in the database
Check the total number of columns vs. the raw file
Check the length of the variables. Are they as expected?
Are any character columns unexpectedly truncated?
Are numeric columns out to the correct number of significant digits?
Are dates reasonable? For example, if you expected dates from 2004, do they say 1970?
How many duplicates are there?
Check if the data in the columns make sense. A few questions you can ask: are any rows "shifted?" Are numeric variables in numeric columns? Is the key column actually a key? Do the column names make sense? Your check of null records should help detect these things.
Can you manually calculate any columns and compare your calculation to the one in the file?
If you are low on processing power or are on a production server and do not want to risk degrading performance for other users, you can do many of the above checks with a simple random sample. Take, say, 100,000 rows at a time.; or, stratify it if needed.
These are just a few checks you can do. The more comparisons and sanity checks, the better off you are.
Most importantly, communicate these findings and anything that seems strange to the file owner. They should be able to give you additional insight to the data load is correct, or if they even gave you the right file in the first place.
You're loading the data and providing as many reasonable checks as possible. If they're satisfied with the outcome, and you're satisfied with the outcome, you should consider the data valid.
I think the most complete solution would be to export the table back to a 2nd flat file that should be identical to the first, and then write a script that does a line by line diff check. You will be able to see if even a single row is different.
Given that you are migrating millions of rows of data I'm assuming that running a script overnight will not be a huge deal vs data integrity.
For quick validation you can just check that the row counts are the same and that there's no obviously bad data like for example a column mapped wrong or an entire column being null.
Im not expert on export from files, but if i should solve this issue i follow something like this.
Load the file into a plain TableA with no restriction so import process run ok.
Create another TableB with all validation. Type, string length, FK.
Create one store procedure to move the data from TableA to TableB
Include a catch error and insert into another table Errors where you insert row_id and err_description

Reverting a database insertion with log files?

I am working on a program that is supposed to insert hundreds of rows to the database per run.
The problem is that once the inserted data is wrong, how can we recover from that run? Currently I only have a log file (I created the format), which records the raw data get inserted (no metadata nor primary keys). Is there a way we can create a log that database can understand it, and once we want to undo the insertion we feed the database with that log file.
Or, if there is alternative mechanism of undoing an operation from a program, kindly let me know, thanks.
The fact, that this is only hundreds of rows, makes it succeptible to the great-grandmother of all undo mechanisms:
have a table importruns with a row for each run you do. I assume it has an integer auto-increment PK
add a field to your data table, that identifies carries the PK of the import run
for insert-only runs, you just need to DELETE FROM sometable WHERE importid=$whatever
If you also have replace/update imports, go one step further
for each data table have a corresponding table, that has one field more: superseededby
for each row you update/replace, place an original copy of the row in this table plus the import id in superseededby
to revert, you now have to add INSERT INTO originaltable SELECT * FROM superseededtable WHERE superseededby=$whatever
You can clean up superseededtable for known-good imports, to make sure, storage doesn't grow unlimited.
You have several options. Depending on when you notice the error.
If you know there is an error with the data, the you can use the transactions API to rollback to changes of the current transaction.
In case you know there was an error only later, then you can create your own log. Make an index identifying the transaction, and add a field to the relevant table where that id would be inserted. This would allow you to identify exactly which transaction it came from. You can also create a stored procedure that deletes rows according to the given transaction id.

How are unique IDs / sequence numbers generated in SAP B1?

I'm wondering if anyone knows how SAP B1 (SAP Business One) generates the unique primary keys it uses in various tables. Examples of what I am talking about would include OCRD.DocEntry and OCPR.CntctCode. These integer columns that get "automatically" incremented.
Typical approaches for doing this include identity columns (e.g., SQL Server), sequences (e.g., Oracle), or manual sequence tables holding a Nextval which is programmatically incremented. As best I can tell, B1 is not using any of these techniques for these columns. So how is it handling them?
The particular instance I'm looking at is using an SQL Server database.
Yes, I'm well aware of the fact that there is no "need" for me to know know about the inner workings, shouldn't be mucking around in the DB, etc. It's just bothering me that I don't know how they are doing it! If anyone can explain, I'd be grateful.
SAPB1 generates new unique numbers using the ONNM table. When a document is added the following takes place.
SQL Transaction begins
The next number is queried from the ONNM table with an update lock
The ONNM table is updated with the new number (+1).
The document is added
The SQL transaction is committed.
Running an SQL SELECT statement with an update lock returns the current row while simultaneously locking that row until the end of the transaction. You are guaranteed that no other user can change that row between when you select it and when the transaction ends.
You can use SQL Profiler to watch the statements executed when you perform actions in SAP B1. Here is the line that gets the next number to use in a Quotation. Quotations are ObjectType 23.
SELECT T0.* FROM [dbo].[ONNM] T0 WITH (UPDLOCK) WHERE T0.[ObjectCode] = '23'
SAP B1 use ONNM for sequence number generation.
it maintains an auto key for every object that was registered in it. and based on the auto key sequence number will be generated.
for every add event this auto key will be incremented by +1

SQL Identity Column out of step

We have a set of databases that have a table defined with an Identity column as the primary key. As a sub-set of these are replicated to other servers, a seed system was created so that they could never clash. That system was by using a starting seed with an increment of 50.
In this way the table on DB1 would generate 30001, 30051 etc, where Database2 would generate 30002, 30052 and so on.
I am looking at adding another database into this system (it is split for scaling/loading purposes) and have discovered that the identites have got out of sync on one or two of the databases - i.e. database 3 that should have numbers ending in 3, doesn't anymore. The seeding and increments is still correct according to the table design.
I am obviously going to have to work around this problem somehow (probably by setting a high initial value), but can anyone tell me what would cause them to get out of sync like this? From a query on the DB I can see the sequence went as follows: 32403,32453, 32456, 32474, 32524, 32574 and has continued in increments of 50 ever since it went wrong.
As far as I am aware no bulk-inserts or DTS or anything like that has put new data into these tables.
Second (bonus) question - how to reset the identity so that it goes back to what I want it to actually be!
EDIT:
I know the design is in principle a bit ropey - I didn't ask for criticism of it, I just wondered how it could have got out of sync. I inherited this system and changing the column to a GUID - whilst undoubtedly the best theoretical solution - is probably not going to happen. The system evolved from a single DB to multiple DBs when the load got too large (a few hundred GBs currently). Each ID in this table will be referenced in many other places - sometimes a few hundred thousand times each (multiplied by about 40,000 for each item). Updating all those will not be happening ;-)
Replication = GUID column.
To set the value of the next ID to be 1000:
DBCC CHECKIDENT (orders, RESEED, 999)
If you want to actually use Primary Keys for some meaningful purpose other than uniquely identify a row in a table, then it's not an Identity Column, and you need to assign them some other explicit way.
If you want to merge rows from multiple tables, then you are violating the intent of Identity, which is for one table. (A GUID column will use values that are unique enough to solve this problem. But you still can't impute a meaningful purpose to them.)
Perhaps somebody used:
SET IDENTITY INSERT {tablename} ON
INSERT INTO {tablename} (ID, ...)
VALUES(32456, ....)
SET IDENTITY INSERT {tablename} OFF
Or perhaps they used DBCC CHECKIDENT to change the identity. In any case, you can use the same to set it back.
It's too risky to rely on this kind of identity strategy, since it's (obviously) possible that it will get out of synch and wreck everything.
With replication, you really need to identify your data with GUIDs. It will probably be easier for you to migrate your data to a schema that uses GUIDs for PKs than to try and hack your way around IDENTITY issues.
To address your question directly,
Why did it get out of sync may be interesting to discuss, but the only result you could draw from the answer would be to prevent it in the future; which is a bad course of action. You will continue to have these and bigger problems unless you deal with the design which has a fatal flaw.
How to set the existing values right is also (IMHO) an invalid question, because you need to do something other than set the values right - it won't solve your problem.
This isn't to disparage you, it's to help you the best way I can think of. Changing the design is less work both short term and long term. Not to change the design is the pathway to FAIL.
This doesn't really answer your core question, but one possibility to address the design would be to switch to a hi_lo algorithm. it wouldn't require changing the column away from an int. so it shouldn't be nearly as much work as changing to a guid.
Hi_lo is used by the nhibernate ORM, but I couldn't find much documentation on it.
Basically the way a Hi_lo works is you have 1 central place where you keep track of your hi value. 1 table in 1 of the databases that every instance of your insert application can see. then you need to have some kind of a service (object, web service, whatever) that has a life somewhat longer than a single entity insert. this service when it starts up will go to the hi table, grab the current value, then increment the value in that table. Use a read committed lock to do this so that you won't get any concurrency issues with other instances of the service. Now you would use the new service to get your next id value. It internally starts at the number it got from the db, and when it passes that value out, increments by 1. keeping track of this current value and the "range" it's allowed to pass out. A simplistic example would be this.
service 1 gets 100 from "hi_value" table in db. increments db value 200.
service 1 gets request for a new ID. passes out 100.
another instance of the service, service 2 (either another thread, another middle tier worker machine, etc) spins up, gets 200 from the db, increments db to 300.
service 2 gets a request for a new id. passes out 200.
service 1 gets a request for a new id. passes out 101.
if any of these ever gets to passing out more than 100 before dying, then they will go back to the db, and get the current value and increment it and start over. Obviously there's some art to this. How big should your range be, etc.
A very simple variation on this is a single table in one of your db's that just contains the "nextId" value. basically manually reproducing oracle's sequence concept.