Incrementing PK column from specific number(PostgreSQL) - sql

In my PostgreSQL database I have table users which has two columns, PK named uid(serial) and user_type(integer). Is it possible that for regular users (of user_type 1) uid starts from 0 (and increments regularly), and for non-regular users (of user_type 2), uid starts from (let's say) 5000, and to increments from that number (5000, 5001, 5002...) each time new non-regular user is added?
I won't have more than 2000 regular users, so overlap between uids of regular and non-regular users will never happen.

A serial data type will create a sequence and pull the default value for your column out of that sequence. For what you're trying to do you'd need two sequences, pull from these sequences and insert the uid explicitly. This is not something the serial can nor should do.
To echo #a_horse_with_no_name, you shouldn't put information into a serial data type. A generated primary key is only acceptable if it is completely opaque to the table users. Please consider just letting serial do its work and update your application code to react properly to user_type. Since your concern seems to be id collision with external entities, I'd suggest storing the user ids generated by an external system in a separate field, say extern_uid.
Or have the external system generate UUID strings you can safely use in your uid column. If the external system is a PostgresSQL database as well, you might use the uuid-ossp module to generate the UUID/GUID.
If you absolutely have to use sequences, you'd need to:
CREATE SEQUENCE uid_one START 1;
CREATE SEQUENCE uid_two START 5000;
INSERT INTO user (uid, user_type) VALUES (nextval('uid_one'::regclass), 1);
INSERT INTO user (uid, user_type) VALUES (nextval('uid_two'::regclass), 2);
Selecting the appropiate statement is left to the application.

Related

how to set ID start from 10000 or any 5 digit number in SQL and C#?

I have table called 'users'.
user_id
user_name
user_password
The user_id column start from 1.
Is there any way to start from 10000 or any 5-digit number?
Note : I user SQL in my C# Program.
When defining the table, you can use
Create Table Users
(
user_id int IDENTITY(10000,1),
...
)
or if the table is already defined with rows in it, then you can use
DBCC CHECKIDENT ('table_name', RESEED, new_value);
where new_value is whatever number you need to start with again.
Don't do it.
The primary key of a table should only ensure row uniqueness, and should not convey any information about the data. Its "form" should be of no consequence.
You probably need to ask yourself why do I want specific values? If you want to expose them in the UI, or send it to an external related application, then you may be better off creating a secondary [auto-]generated column just for this purpose. I would strongly recommend you leave the PK alone and don't touch it.

advantages and disadvantages of database automatic number generator for each row vs manual numbering for each row

Imagine two tables that implemented like the following description:
The first table rows numbers created by database system administration automatically.
The second table rows numbers created manually by the programmer in a sequential order.
The main question is what are the advantages and disadvantages of these two approaches?
One distinct advantage of having the database manage auto-numbering over manually creating them is that the database implementation is thread safe - and manually creating them is usually (99.9% of the cases) is not (It's hard to do it correctly).
On the other hand, the database implementation does not guarantee sequential numbering - there can be gaps in the numbers.
Given these two facts, an auto-increment column should be used only as a surrogate key, when the values of this column does not have any business meaning - but they are simple used as a simple row identifier.
Please note that when using a surrogate key, it's best to also enforce uniqueness of a natural key - otherwise you might get rows where all the data is duplicated except the surrogate key.
When the database automatically create numbers, you habe less work.
Think about a sign up system, you have fields like name, email, password and so one:
1.) the number is generated by the database, so you can just insert the data into the table.
2.) if this is not the case you have to get the last number, so before the insert into you have to get the last id so instead a insert into you have a select + insert into.
Another reason is, what happened when you delete a row in your table?
Maybe in a forum, you want to delete the account but not all of his posts, so you can work with a workaround and when a post has a user_id not given you know this is/was a deleted or banned account - if you give a new user the number from a deleted user you will come in trouble.

Inserting test data which references another table without hard coding the foreign key values

I'm trying to write a SQL query that will insert test data into two tables, one of which references the other.
Tables are created from something like the following:
CREATE TABLE address (
address_id INTEGER IDENTITY PRIMARY KEY,
...[irrelevant columns]
);
CREATE TABLE member (
...[irrelevant columns],
address INTEGER,
FOREIGN KEY(address) REFERENCES address(address_id)
);
I want ids in both tables to auto increment, so that I can easily insert new rows later without having to look into the table for ids.
I need to insert some test data into both tables, about 25 rows in each. Hardcoding ids for the insert causes issues with inserting new rows later, as the automatic values for the id columns try and start with 1 (which is already in the database). So I need to let the ids be automatically generated, but I also need to know which ids are in the database for inserting test data into the member database - I don't believe the autogenerated ones are guaranteed to be consecutive, so can't assume I can safely hardcode those.
This is test data - I don't care which record I link each member row I am inserting to, only that there is an address record in the address table with that id.
My thoughts for how to do this so far include:
Insert addresses individually, returning the id, then use that to insert an individual member (cons: potentially messy, not sure of the syntax, harder to see expected sets of addresses/members in the test data)
Do the member insert with a SELECT address_id FROM address WHERE [some condition that will only give one row] for the address column (cons: also a bit messy, involves a quite long statement for something I don't care about)
Is there a neater way around this problem?
I particularly wonder if there is a way to either:
Let the auto increment controlling functions be aware of manually inserted id values, or
Get the list of inserted ids from the address table into a variable which I can use values from in turn to insert members.
Ideally, I'd like this to work with as many (irritatingly slightly different) database engines as possible, but I need to support at least postgresql and sqlite - ideally in a single query, although I could have two separate ones. (I have separate ones for creating the tables, the sole difference being INTEGER GENEREATED BY DEFAULT AS IDENTITY instead of just IDENTITY.)
https://www.postgresql.org/docs/8.1/static/functions-sequence.html
Sounds like LASTVAL() is what you're looking for. It was also work in the real world to maintain transactional consistency between multiple selects, as it's scoped to your sessions last insert.

How to prevent adding identical records to SQL database

I am writing a program that recovers structured data as individual records from a (damaged) file and collects the results into a sqlite database.
The program is invoked several times with slightly different recovery parameters. That leads to recovering often the same, but sometimes different data from the file.
Now, every time I run my program with different parameters, it's supposed to add just the newly (different) found items to the same database.
That means that I need a fast way to tell if each recovered record is already present in the DB or not, in order to add them only if they're not existing in the DB yet.
I understand that for each record I want to add, I could first do a SELECT for all columns to see if there is already a matching record in the DB, and only add the new one if no same is found.
But since I'm adding 10000s of records, doing a SELECT for each of these records feels pretty inefficient (slow) to me.
I wonder if there's a smarter way to handle this? I.e, is there a way I can tell sqlite that I do not want duplicate entries, and so it automatically detects and rejects them? I know about the UNIQUE modifier, but that's not it because it applies to single columns only, doesn't it? I'd need to be able to say that the combination of COL1+COL2+COL3 must be unique. Is there a way to do that?
Note: I never want to update any existing records. I only want to collect a set of different records.
Bonus part - performance
In a classic programming language, I'd use a key-value dictionary where the key is the sum of all a record's values. Similarly, I could calculate a Hash code for each added record and look that hash code up first. If there's no match, then the record is surely not in the DB yet; If there is a match I'd still have to search the DB for any duplicates. That'd surely be faster already, but I still wonder if sqlite can make this more efficient.
Try:
sqlite> create table foo (
...> a int,
...> b int,
...> unique(a, b)
...> );
sqlite>
sqlite> insert into foo values(1, 2);
sqlite> insert into foo values(2, 1);
sqlite> insert into foo values(1, 2);
Error: columns a, b are not unique
sqlite>
You could use UNIQUE column constraint or to declare a multiple columns unique constraint you can use UNIQUE () ON CONFLICT :
CREATE TABLE name ( id int , UNIQUE (col_name1 type , col_name2 type) ON CONFLICT IGNORE )
SQLite has two ways of expressing uniqueness constraints: PRIMARY KEY and UNIQUE. Both of them create an index and so the lookup happens through the created index.
If you do not want to use an SQL approach (as mentioned in other answers) you can do a select for all your data when the program starts, store the data in a dictionary and work with the dictionary do decide which records to insert to your DB.
The benefit of this approach is the single select is much faster than many small selects.
The disadvantage is that it won't work well if you don't have enough memory to store your data in.

Creating database tables with "either or" type fields

I have a database that tracks players through their attempts at a game. To accomplish this, I keep a table of users and store the attempts in a separate table. The schema for these tables is:
CREATE TABLE users (
id BIGINT PRIMARY KEY, -- the local unique ID for this user
name TEXT UNIQUE, -- a self-chosen username for the user
first_name TEXT, -- the user's first name
last_name TEXT, -- the user's last name
email TEXT, -- the user's email address
phone TEXT -- the user's phone number
);
CREATE TABLE trials (
timestamp TIMESTAMP PRIMARY KEY, -- the time the trial took place
userid BIGINT, -- the ID of the user who completed the trial
score NUMERIC, -- the score for the trial
level NUMERIC, -- the difficulty level the trial ran at
penalties NUMERIC -- the number of penalties accrued in the trial
);
Now I need to be able to store attempts that come from "transient" users. These attempts should not be linked back to an existing user. However, these transient users will still be able to enter a name that displays in the results. This name is not required to be unique in the table, since it does not represent a "real" user.
My first thought was to create a new field in the trials table called name. If userid is null, I would know it is a transient user, but I would still be able to show the name fields in the results. This approach doesn't quite smell right, and it seems like it will make my queries a bit more complicated. Additionally, it seems like I'm duplicating data in a sense.
Another thought was to replace userid with a useref text field that would be some kind of formatted string representing the user. For example, if the value were enclosed in curly braces, I would know it's an ID, i.e. {58199204}. If the value were not enclosed, I would treat it as a transient user. This more accurately represents what I'm trying to do conceptually (i.e. it's either an ID or a transient user string), but it would really complicate my queries.
I'm using SQLite for the backend... It lacks some of the extended features of SQL Server or MySQL.
Any thoughts on these or another approach to the problem?
Without more information about why a transient user can use but not exist in the system, I concur with your idea to:
Add a NAME column to the TRIALS table
Make the USER_ID column in the TRIALS table nullable/optional in order to indicate transient user status
If you could allow a transient user to exist in the system, I would recommend:
Creating a USER_TYPE_CODE table
Update the USERS table to include the USER_TYPE_CODE column (w/ foreign key reference to the USER_TYPE_CODE table)
You can either create a UserType field in the users table, and add "transient" users to the Users table, but this might increase the size of the Users table, or create a UserType field on the Trials table and create an additional TransientUsers table.
This will allow you to distinguish the difference of userid with the UserType field.
I'd like to point out that you really shouldn't use the formatted string approach. What happens if a user finds a bugged input port into your database and inputs "{8437101}" (or whatever user ID they want)?
SQLite lets you mix types in a field. I'd suggest you do as you were thinking, but without the braces. Disallow numeric names. If the userid is a number, which is exactly when it matches an id in the users table, it is a user id. If not it's the name of a transient user.