In a Oracle database table, how are auto incremented sequence values done with PL/SQL such as for a key value or id typed columns?
Included is a discussion on sizing table resources based on what you know about the projected growth of all the related tables in a given schema. How many piano tuners are there in the city of Chicago? Probably less than the population of the city altogether... :) and so on.
Know your data.
How do I do that? Read on.
Using Database Triggers to Auto Increment Column Values
One possible approach is to use database triggers. The values are ensured unique through the implementation of sequence typed database objects.
In the example above, the table FUNDRAISER has an associated sequence called FUNDRAISER_SEQ for a primary key value.
The Parts List: What you need for the example.
CREATE TABLE "FUND_PEOPLE"
( "PERSON_ID" NUMBER NOT NULL ENABLE,
"ANONYMOUS_IND" VARCHAR2(1) NOT NULL ENABLE,
"ALIAS_IND" VARCHAR2(1) NOT NULL ENABLE,
"ALIAS_NAME" VARCHAR2(50),
"FIRST_NAME" VARCHAR2(50),
"LAST_NAME" VARCHAR2(50),
"PHONE_NUMBER" VARCHAR2(20),
"EMAIL_ADDRESS" VARCHAR2(100),
CONSTRAINT "FUND_PEOPLE_PK" PRIMARY KEY ("PERSON_ID") ENABLE
) ;
This is an older sample of code, but it the
SELECT FROM ... INTO DUAL
Construct was what we did with older releases of the Oracle Database. Release 11g changes that so that sequence increments can be assigned directly to a PL/SQL variable or a SQL statement call. Both will work.
Challenge yourself to figure out what the revised PL/SQL might look like...
CREATE OR REPLACE TRIGGER "BI_FUND_PEOPLE"
before insert on "FUND_PEOPLE"
for each row
begin
if :NEW."PERSON_ID" is null then
:NEW."PERSON_ID" := "FUND_PEOPLE_SEQ".nextval;
end if;
end;
The notation is:
"BI" for Before INSERT
:NEW for the value of the column value after the event.
FUND_ID is the column that is ignored in the insert statement. The starting value of each sequence is 20. A call to the SEQ.NEXTVAL value increments the sequence by the amount identified in the sequence declaration.
A Brief Discussion on Sequence Sizing
If you spec a sequence without a MAXIMUM value, the system assigns the largest possible integer handled by the database, which is 20+ digits long.
Now what are we going to do with a table that contains records
numbering twenty or more orders of magnitude:
"9999999999999999999999999999"?
Hmmmm... I've heard of "defensiveness in coding", but for most applications, this is... in defensible?
Consider looking at the original project design I have worked with on Google Code:
Google Code Project: Fundraiser Organizer
This is the rough relation in record counts between each entity in the
schema I set up. The two circles are metrics which are proportional
to what values they calculate or aggregate. Considering the data
cases, what are the relative magnitudes for each count of records?
A look at the first screenshot shows my estimates. This project started out as a sheet of data to keep track of an fund raising drive. Hope you find it informative and thought inspiring!
The UI and my testing efforts were conducted on a public demo site hosted by Oracle at Oracle Apex Demo System
Related
I have made a library management system using Postgresql and I would like to limit the number of books a student/employee is able to borrow. If someone wants to add a new tuple where a student/employee has borrowed a book, and that particular user has already borrowed for example 7 books, the table won't accept another addition.
According to me, either you need to handle this from business logic perspective i.e before insert retrieve the data of a specific student and then take action
or
From a Rule-based perspective
Do not wait for any additional rows to be inserted by the application, but, constantly watch the table for the count, upon reaching the count, db notifies the app instead.
You can call/trigger a stored procedure based on the
number of books taken by a specific user, if count_num_books > 7
then, the app would handle it.
Please take a look at ON CONFLICT as mentioned in their document
http://www.postgresqltutorial.com/postgresql-upsert/
You can create a stored procedure with insert on conflict and take action accordingly.
INSERT INTO table_name(column_list) VALUES(value_list)
ON CONFLICT target action;
In general, SQL does not make this easy. The typical solution is something like this:
Keep a table with one row per book borrowed and student.
Keep a count of outstanding books in the students table.
Maintain this count using triggers.
Add a check constraint on the count.
Postgres does have more convenient methods. One method is to store the list of borrowed books as an array or in a JSON structure. Alas, this is not a relational format. And, it doesn't allow the declaration of foreign key constraints.
That said, it does allow a simple check constraint on the books_borrowed column -- by using cardinality() for instance. And it doesn't make it easy to validate that there are no duplicates in the array. Also, INSERTs, UPDATEs, and DELETEs are more complicated.
For your particular problem, I would recommend the first approach.
As mentioned the best place for this the APPLICATION checking. But otherwise perhaps this is a case where the easiest method is doing nothing - ie don't try keeping a running total number of active checkouts. As Postgres has no issue with a trigger selecting from the table firing the trigger just derive the outstanding books checked out. The following assumes the existence of an checkout table as:
create table checkouts
( checkout_id serial
, student_employee_id integer not null
, book_id integer not null
, out_date date not null
, return_date date default null
) ;
Then create an Insert row trigger on this table and call the following:
create or replace function limit_checkouts()
returns trigger
language plpgsql
as $$
declare
checkout_count integer;
begin
select count(*)
into checkout_count
from checkouts c
where c.student_employee_id = new.student_employee_id
and returned_date is null ;
if checkout_count > 7
then
raise exception 'Checkout limit exceeded';
end if;
return new;
end;
$$;
Most articles about lookup tables deal with its creation, initial population and use (for looking up: id-->value).
My question is about dynamic updating (inserting new values) of the lookup table, as new data is stored in data tables.
For example, we have a table of persons, and one attribute (column) of it is city of residency. Many persons would have the same value, so it makes sense to use a lookup table for it. As the list of cities that would appear is not known beforehand, the lookup table is initially empty.
To clarify, the value(s) of city is/are:
not know beforehand (we don't know what customer might contact us tomorrow)
there is no "list of all possible cities" (real life cities come and go, get renamed etc)
many persons will share the same value
initially, there will be a few different values (up to 10), later more (but not very much, a few hundred)
Also, the expected number of person objects will be thousands if not millions.
So the basic algorithm is (pseudocode):
procedure insertPerson(name,age,city)
{
cityId := lookup(city);
if cityId == null
cityId := insertIntoLookupTableAndReturnId(city);
INSERT INTO person_table VALUES (name,age,cityId);
}
What is a good lookup table organization for this problem? What exact code to use?
The goal is high performance of person insertion (whether the city is already in the lookup table or not).
General answers are welcome and Oracle 11g would be great.
Note: This is about an OLTP scenario. New persons are inserted in real time. There is no known list of persons that can be used for initialization of the lookup table.
Your basic approach appears to be OK except for one small change I would do: The function lookup(city) will search for the city and return the ID and, if the city is not found, will insert a new record and return its ID. This way, you are further encapsulating the management of the lookup table (cities). As such, your code would become:
procedure insertPerson(name,age,city)
{
INSERT INTO person_table VALUES (name,age,lookup(city));
}
One additional thing you may consider is to create a VIEW that would be used to query for persons' information, including the name of the city.
After some testing, the best performance (least block accesses) I could find was with an index organized table as the lookup table and the below SQL for inserting data.
create table citylookup (key number primary key, city varchar2(100)) organization index;
create unique index cltx1 on citylookup(city);
create sequence lookupkeys;
create sequence datakeys;
create table data (x number primary key, k number references citylookup(key) not null);
-- "Rome" is the city we try to insert
insert all
when oldkey is null then -- if the city is not in the lookup yet
into citylookup values (lookupkeys.nextval, 'Rome') -- then insert it
-- finally, insert the data row with the correct lookup key
when 1=1 then into data values (datakeys.nextval,nvl(oldkey, lookupkeys.nextval))
select (select key from citylookup where city='Rome') as oldkey from dual;
Result: 6+2 blocks for city-exists case, 10+2 for city-doesn't-exists yet (as reported by SQL*Plus with set autotrace on: first value is db block gets, the second consistent gets).
Alternatively, as suggested by Dudu Markovitz, the lookup table could cached in the application and in the hit case just perform an simple INSERT into the DATA table, which then costs only 6+1 block accesses (for the above test case). Here the problem is keeping the cached lookup table in sync with the database and possible other instances of the server application.
PS: The above INSERT ALL command "wastes" a sequence value from the lookupkeys sequence on each run, even if no new city is inserted into the lookup table. It is an additional exercise to solve that.
I am currently working on a project for the management of oil distribution, and i need the receipts of every bill to get stored in a database. I am thinking of building a smart key for the receipts which will contain the first 2 letters of the city, the gas station id, the auto increment number, first letter of the month and the last 2 digits of the year. So it will be somewhat like this:
"AA-3-0001-J15". What i am wondering is how to make the AI number to go back at 0001 when the month changes. Any suggestions?
To answer the direct question - how to make the number restart at 1 at the beginning of the month.
Since it is not a simple IDENTITY column, you'll have to implement this functionality yourself.
To generate such complex value you'll have to write a user-defined function or a stored procedure. Each time you need a new value of your key to insert a new row in the table you'll call this function or execute this stored procedure.
Inside the function/stored procedure you have to make sure that it works correctly when two different sessions are trying to insert the row at the same time. One possible way to do it is to use sp_getapplock.
You didn't clarify whether the "auto increment" number is the single sequence across all cities and gas stations, or whether each city and gas station has its own sequence of numbers. Let's assume that we want to have a single sequence of numbers for all cities and gas stations within the same month. When month changes, the sequence restarts.
The procedure should be able to answer the following question when you run it: Is the row that I'm trying to insert the first row of the current month? If the generated value is the first for the current month, then the counter should be reset to 1.
One method to answer this question is to have a helper table, which would have one row for each month. One column - date, second column - last number of the sequence. Once you have such helper table your stored procedure would check: what is the current month? what is the last number generated for this month? If such number exists in the helper table, increment it in the helper table and use it to compose the key. If such number doesn't exist in the helper table, insert 1 into it and use it to compose the key.
Finally, I would not recommend to make this composite value as a primary key of the table. It is very unlikely that user requirement says "make the primary key of your table like this". It is up to you how you handle it internally, as long as accountant can see this magic set of letters and numbers next to the transaction in his report and user interface. Accountant doesn't know what a "primary key" is, but you do. And you know how to join few tables of cities, gas stations, etc. together to get the information you need from a normalized database.
Oh, by the way, sooner or later you will have more than 9999 transactions per month.
Do you want to store all that in one column? That sounds to me like a composite key over four columns...
Which could look like the following:
CREATE TABLE receipts (
CityCode VARCHAR2(2),
GasStationId NUMERIC,
AutoKey NUMERIC,
MonthCode VARCHAR2(2),
PRIMARY KEY (CityCode, GasStationId, AutoKey, MonthCode)
);
Which DBMS are you using? (MySQL, MSSQL, PostgreSQL, ...?)
If it's MySQL you could have a batch-job which runs on the month's first which executes:
ALTER TABLE tablename AUTO_INCREMENT = 1
But that logic would be on application layer instead of DB-layer...
In such cases, it is best to use a User-Defined function to generate this key and then store it. Like :
Create Function MyKeyGenerator(
#city varchar(250) = '',
#gas_station_id varchar(250) = '')
AS
/*Do stuff here
*/
My guess is , you may need another little table that keeps the last generated auto-number for the month and you may need to update it for the first record that generates during the month. For the next records, during the month, you will fetch from there and increment by 1. You can alse use a stored procedure that returns an Integer as a return code, just for the autonumber part and then do the rest in a function.
Btw, you may want to note that, using the first letter of the month has pitfalls, because two months can have the same first letter. May be try the the two-digit-numeric for the month or the first three letters of the month name.
If you ready not to insist the the AI number exactly be of identity type, you can have another table, where it is a non-identity regular integer, and then run an SQL Server Agent Task calling a stored procedure that'll do the incrementing business.
One of my job functions is being responsible for mining and marketing on a large newsletter subscription database. Each one of my newsletters has four columns (newsletter_status, newsletter_datejoined, newsletter_dateunsub, and newsletter_unsubmid).
In addition to these columns, I also have a master unsub column that our customer service dept. can update to accomodate irate subscribers who wish to be removed from all our mailings, and another column that gets updated if a hard bounce (or a set number of soft bounces) occurs called emailaddress_status.
When I pull a count for current valid subscribers for one list I use the following syntax:
select count (*) from subscriber_db
WHERE (emailaddress_status = 'VALID' OR emailaddress_status IS NULL)
AND newsletter_status = 'Y'
and unsub = 'N' and newsletter_datejoined >= '2013-01-01';
What I'd like to have is one query that looks for all columns with %_status, with the aforementioned criteria ordered by current count size.
I'd like for it to look like this:
etc.
I've search around the web for months looking for something similar, but other than running them in a terminal and exporting the results I've not been able to successfully get them all in one query.
I'm running PostgreSQL 9.2.3.
A proper test case would be each aggregate total matching the counts I get when running the individual queries.
Here's my obsfucated table definition for ordinal placement, column_type, char_limit, and is_nullable.
Your schema is absolutely horrifying:
24 ***_status text YES
25 ***_status text YES
26 ***_status text YES
27 ***_status text YES
28 ***_status text YES
29 ***_status text YES
where I presume the masked *** is something like the name of a publication/newsletter/etc.
You need to read about data normalization or you're going to have a problem that keeps on growing until you hit PostgreSQL's row-size limit.
Since each item of interest is in a different column the only way to solve this with your existing schema is to write dynamic SQL using PL/PgSQL's EXECUTE format(...) USING .... You might consider this as an interim option only, but it's a bit like using a pile driver to jam the square peg into the round hole because a hammer wasn't big enough.
There are no column name wildcards in SQL, like *_status or %_status. Columns are a fixed component of the row, with different types and meanings. Whenever you find yourself wishing for something like this it's a sign that your design needs to be re-thought.
I'm not going to write an example since (a) this is an email marketing company and (b) the "obfuscated" schema is completely unusable for any kind of testing without lots of manual work re-writing it. (In future, please provide CREATE TABLE and INSERT statements for your dummy data, or better yet, a http://sqlfiddle.com/). You'll find lots of examples of dynamic SQL in PL/PgSQL - and warnings about how to avoid the resulting SQL injection risks by proper use of format - with a quick search of Stack Overflow. I've written a bunch in the past.
Please, for your sanity and the sanity of whoever else needs to work on this system, normalize your schema.
You can create a view over the normalized tables to present the old structure, giving you time to adapt your applications. With a bit more work you can even define a DO INSTEAD view trigger (newer Pg versions) or RULE (older Pg versions) to make the view updateable and insertable, so your app can't even tell that anything has changed - though this comes at a performance cost so it's better to adapt the app if possible.
Start with something like this:
CREATE TABLE subscriber (
id serial primary key,
email_address text not null,
-- please read http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
-- for why I merged "fname" and "lname" into one field:
realname text,
-- Store birth month/year as a "date" with a "CHECK" constraint forcing it to be the 1st day
-- of the month. Much easier to work with.
birthmonth date,
CONSTRAINT birthmonth_must_be_day_1 CHECK ( extract(day from birthmonth) = 1),
postcode text,
-- Congratulations! You made "gender" a "text" field to start with, you avoided
-- one of the most common mistakes in schema design, the boolean/binary gender
-- field!
gender text,
-- What's MSO? Should have a COMMENT ON...
mso text,
source text,
-- Maintain these with a trigger. If you want modified to update when any child record
-- changes you can do that with triggers on subscription and reducedfreq_subscription.
created_on timestamp not null default current_timestamp,
last_modified timestamp not null,
-- Use the native PostgreSQL UUID type, after running CREATE EXTENSION "uuid-ossp";
uuid uuid not null,
uuid2 uuid not null,
brand text,
-- etc etc
);
CREATE TABLE reducedfreq_subscription (
id serial primary key,
subscriber_id integer not null references subscriber(id),
-- Suspect this was just a boolean stored as text in your schema, in which case
-- delete it.
reducedfreqsub text,
reducedfreqpref text,
-- plural, might be a comma list? Should be in sub-table ("join table")
-- if so, but without sample data can only guess.
reducedfreqtopics text,
-- date can be NOT NULL since the row won't exist unless they joined
reducedfreq_datejoined date not null,
reducedfreq_dateunsub date
);
CREATE TABLE subscription (
id serial primary key,
subscriber_id integer not null references subscriber(id),
sub_name text not null,
status text not null,
datejoined date not null,
dateunsub date
);
CREATE TABLE subscriber_activity (
last_click timestamptz,
last_open timestamptz,
last_hardbounce timestamptz,
last_softbounce timestamptz,
last_successful_mailing timestamptz
);
To call it merely "horrifying" shows a great deal of tact and kindness on your part. Thank You. :) I inherited this schema only recently (which was originally created by the folks at StrongMail).
I have a full relational DB re-arch project on my roadmap this year - the sample normalization is very much inline with what I'd been working on. Very interesting insight on realname, I hadn't really thought about that. I suppose the only reason StrongMail had it broken out was for first name email personalization.
MSO is multiple systems operator (cable company). We're a large lifestyle media company, and the newsletters we produce are on food, travel, homes and gardening.
I'm creating a Fiddle for this - I'm new here so going forward I'll be more mindful of what you guys need to be able to help. Thank you!
I have a couple of tables in a SQL 2008 server that I need to generate unique ID's for. I have looked at the "identity" column but the ID's really need to be unique and shared between all the tables.
So if I have say (5) five tables of the flavour "asset infrastructure" and I want to run with a unique ID between them as a combined group, I need some sort of generator that looks at all (5) five tables and issues the next ID which is not duplicated in any of those (5) five tales.
I know this could be done with some sort of stored procedure but I'm not sure how to go about it. Any ideas?
The simplest solution is to set your identity seeds and increment on each table so they never overlap.
Table 1: Seed 1, Increment 5
Table 2: Seed 2, Increment 5
Table 3: Seed 3, Increment 5
Table 4: Seed 4, Increment 5
Table 5: Seed 5, Increment 5
The identity column mod 5 will tell you which table the record is in. You will use up your identity space five times faster so make sure the datatype is big enough.
Why not use a GUID?
You could let them each have an identity that seeds from numbers far enough apart never to collide.
GUIDs would work but they're butt-ugly, and non-sequential if that's significant.
Another common technique is to have a single-column table with an identity that dispenses the next value each time you insert a record. If you need them pulling from a common sequence, it's not unlikely to be useful to have a second column indicating which table it was dispensed to.
You realize there are logical design issues with this, right?
Reading into the design a bit, it sounds like what you really need is a single table called "Asset" with an identity column, and then either:
a) 5 additional tables for the subtypes of assets, each with a foreign key to the primary key on Asset; or
b) 5 views on Asset that each select a subset of the rows and then appear (to users) like the 5 original tables you have now.
If the columns on the tables are all the same, (b) is the better choice; if they're all different, (a) is the better choice. This is a classic DB spin on the supertype / subtype relationship.
Alternately, you could do what you're talking about and recreate the IDENTITY functionality yourself with a stored proc that wraps INSERT access on all 5 tables. Note that you'll have to put a TRANSACTION around it if you want guarantees of uniqueness, and if this is a popular table, that might make it a performance bottleneck. If that's not a concern, a proc like that might take the form:
CREATE PROCEDURE InsertAsset_Table1 (
BEGIN TRANSACTION
-- SELECT MIN INTEGER NOT ALREADY USED IN ANY OF THE FIVE TABLES
-- INSERT INTO Table1 WITH THAT ID
COMMIT TRANSACTION -- or roll back on error, etc.
)
Again, SQL is highly optimized for helping you out if you choose the patterns I mention above, and NOT optimized for this kind of thing (there's overhead with creating the transaction AND you'll be issuing shared locks on all 5 tables while this process is going on). Compare that with using the PK / FK method above, where SQL Server knows exactly how to do it without locks, or the view method, where you're only inserting into 1 table.
I found this when searching on google. I am facing a simillar problem for the first time. I had the idea to have a dedicated ID table specifically to generate the IDs but I was unsure if it was something that was considered OK design. So I just wanted to say THANKS for confirmation.. it looks like it is an adequate sollution although not ideal.
I have a very simple solution. It should be good for cases when the number of tables is small:
create table T1(ID int primary key identity(1,2), rownum varchar(64))
create table T2(ID int primary key identity(2,2), rownum varchar(64))
insert into T1(rownum) values('row 1')
insert into T1(rownum) values('row 2')
insert into T1(rownum) values('row 3')
insert into T2(rownum) values('row 1')
insert into T2(rownum) values('row 2')
insert into T2(rownum) values('row 3')
select * from T1
select * from T2
drop table T1
drop table T2
This is a common problem for example when using a table of people (called PERSON singular please) and each person is categorized, for example Doctors, Patients, Employees, Nurse etc.
It makes a lot of sense to create a table for each of these people that contains thier specific category information like an employees start date and salary and a Nurses qualifications and number.
A Patient for example, may have many nurses and doctors that work on him so a many to many table that links Patient to other people in the PERSON table facilitates this nicely. In this table there should be some description of the realtionship between these people which leads us back to the categories for people.
Since a Doctor and a Patient could create the same Primary Key ID in their own tables, it becomes very useful to have a Globally unique ID or Object ID.
A good way to do this as suggested, is to have a table designated to Auto Increment the primary key. Perform an Insert on that Table first to obtain the OID, then use it for the new PERSON.
I like to go a step further. When things get ugly (some new developer gets got his hands on the database, or even worse, a really old developer, then its very useful to add more meaning to the OID.
Usually this is done programatically, not with the database engine, but if you use a BIG INT for all the Primary Key ID's then you have lots of room to prefix a number with visually identifiable sequence. For example all Doctors ID's could begin with 100, all patients with 110, all Nurses with 120.
To that I would append say a Julian date or a Unix date+time, and finally append the Auto Increment ID.
This would result in numbers like:
110,2455892,00000001
120,2455892,00000002
100,2455892,00000003
since the Julian date 100yrs from now is only 2492087, you can see that 7 digits will adequately store this value.
A BIGINT is 64-bit (8 byte) signed integer with a range of -9.22x10^18 to 9.22x10^18 ( -2^63 to 2^63 -1). Notice the exponant is 18. That's 18 digits you have to work with.
Using this design, you are limited to 100 million OID's, 999 categories of people and dates up to... well past the shelf life of your databse, but I suspect thats good enough for most solutions.
The operations required to created an OID like this are all Multiplication and Division which avoids all the gear grinding of text manipulation.
The disadvantage is that INSERTs require more than a simple TSQL statement, but the advantage is that when you are tracking down errant data or even being clever in your queries, your OID is visually telling you alot more than a random number or worse, an eyesore like GUID.