Factor (string) to Numeric in PostgreSQL - sql

Similar to this, is it possible to convert a String field to Numeric in PostgreSQL. For instance,
create table test (name text);
insert into test (name) values ('amy');
insert into test (name) values ('bob');
insert into test (name) values ('bob');
insert into test (name) values ('celia');
and add a field that is
name | num
-------+-----
amy | 1
bob | 2
bob | 2
celia | 3

The most effective "hash"-function of all is a serial primary key - giving you a unique number like you wished for in the question.
I also deal with duplicates in this demo:
CREATE TEMP TABLE string (
string_id serial PRIMARY KEY
,string text NOT NULL UNIQUE -- no dupes
,ct int NOT NULL DEFAULT 1 -- count instead of dupe rows
);
Then you would enter new strings like this:
(Data-modifying CTE requires PostgreSQL 9.1 or later.)
WITH x AS (SELECT 'abc'::text AS nu)
, y AS (
UPDATE string s
SET ct = ct + 1
FROM x
WHERE s.string = x.nu
RETURNING TRUE
)
INSERT INTO string (string)
SELECT nu
FROM x
WHERE NOT EXISTS (SELECT 1 FROM y);
If the string nu already exists, the count (ct) is increased by 1. If not, a new row is inserted, starting with a count of 1.
The UNIQUE also adds an index on the column string.string automatically, which leads to optimal performance for this query.
Add additional logic (triggers ?) for UPDATE / DELETE to make this bullet-proof - if needed.
Note, there is a super-tiny race condition here, if two concurrent transactions try to add the same string at the same moment in time. To be absolutely sure, you could use SERIALIZABLE transactions. More info and links under this this related question.
Live demo at sqlfiddle.

How 'bout a hash, such as md5, of name?
create table test (name text, hash text);
-- later
update test set hash = md5(name);
If you need to convert that md5 text to a number: Hashing a String to a Numeric Value in PostgresSQL

If they are all single characters, you could do this:
ALTER TABLE test ADD COLUMN num int;
UPDATE test SET num = ascii(name);
Though that would only return the character for the first letter if the string was more than a single character.

The exact case shown in your request can be produced with the dense_rank window function:
regress=# SELECT name, dense_rank() OVER (ORDER BY name) FROM test;
name | dense_rank
-------+------------
amy | 1
bob | 2
bob | 2
celia | 3
(4 rows)
so if you were adding a number for each row, you'd be able to do something like:
ALTER TABLE test ADD COLUMN some_num integer;
WITH gen(gen_name, gen_num) AS
(SELECT name, dense_rank() OVER (ORDER BY name) FROM test GROUP BY name)
UPDATE test SET some_num = gen_num FROM gen WHERE name = gen_name;
ALTER TABLE test ALTER COLUMN some_num SET NOT NULL;
however I think it's much more sensible to use a hash or to assign generated keys. I'm just showing that your example can be achieved.
The biggest problem with this approach is that inserting new data is a pain. It's a ranking (like your example shows) so if you INSERT INTO test (name) VALUES ('billy'); then the ranking changes.

Related

DB2 Using with statement to keep last id

In my project I need to create a script that insert data with auto generate value for the primary key and then to reuse this number for foreign on other tables.
I'm trying to use the WITH statement in order to keep that value.
For instance, I'm trying to do this:
WITH tmp as (SELECT ID FROM (INSERT INTO A ... VALUES ...))
INSERT INTO B ... VALUES tmp.ID ...
But I can't make it work.
Is it at least possible to do it or am I completely wrong???
Thank you
Yes, it is possible, if your DB2-server version supports the syntax.
For example:
create table xemp(id bigint generated always as identity, other_stuff varchar(20));
create table othertab(xemp_id bigint);
SELECT id FROM FINAL TABLE
(INSERT INTO xemp(other_stuff)
values ('a'), ('b'), ('c'), ('d')
) ;
The above snippet of code gives the result below:
ID
--------------------
1
2
3
4
4 record(s) selected.
If you want to re-use the ID to populate another table:
with tmp1(id) as ( SELECT id FROM new TABLE (INSERT INTO xemp(other_stuff) values ('a1'), ('b1'), ('c1'), ('d1') ) tmp3 )
, tmp2 as (select * from new table (insert into othertab(xemp_id) select id from tmp1 ) tmp4 )
select * from othertab;
As per my understanding
You will have to create an auto-increment field with the sequence object (this object generates a number sequence).
You can CREATE SEQUENCE to achieve the auto increment value :
CREATE SEQUENCE seq_person
MINVALUE 1
START WITH 1
INCREMENT BY 1
CACHE 10

Get all missing values between two limits in SQL table column

I am trying to assign ID numbers to records that are being inserted into an SQL Server 2005 database table. Since these records can be deleted, I would like these records to be assigned the first available ID in the table. For example, if I have the table below, I would like the next record to be entered at ID 4 as it is the first available.
| ID | Data |
| 1 | ... |
| 2 | ... |
| 3 | ... |
| 5 | ... |
The way that I would prefer this to be done is to build up a list of available ID's via an SQL query. From there, I can do all the checks within the code of my application.
So, in summary, I would like an SQL query that retrieves all available ID's between 1 and 99999 from a specific table column.
First build a table of all N IDs.
declare #allPossibleIds table (id integer)
declare #currentId integer
select #currentId = 1
while #currentId < 1000000
begin
insert into #allPossibleIds
select #currentId
select #currentId = #currentId+1
end
Then, left join that table to your real table. You can select MIN if you want, or you could limit your allPossibleIDs to be less than the max table id
select a.id
from #allPossibleIds a
left outer join YourTable t
on a.id = t.Id
where t.id is null
Don't go for identity,
Let me give you an easy option while i work on a proper one.
Store int from 1-999999 in a table say Insert_sequence.
try to write an Sp for insertion,
You can easly identify the min value that is present in your Insert_sequence and not in
your main table, store this value in a variable and insert the row with ID from variable..
Regards
Ashutosh Arya
You could also loop through the keys. And when you hit an empty one Select it and exit Loop.
DECLARE #intStart INT, #loop bit
SET #intStart = 1
SET #loop = 1
WHILE (#loop = 1)
BEGIN
IF NOT EXISTS(SELECT [Key] FROM [Table] Where [Key] = #intStart)
BEGIN
SELECT #intStart as 'FreeKey'
SET #loop = 0
END
SET #intStart = #intStart + 1
END
GO
From there you can use the key as you please. Setting a #intStop to limit the loop field would be no problem.
Why do you need a table from 1..999999 all information you need is in your source table. Here is a query which give you minimal ID to insert in gaps.
It works for all combinations:
(2,3,4,5) - > 1
(1,2,3,5) - > 4
(1,2,3,4) - > 5
SQLFiddle demo
select min(t1.id)+1 from
(
select id from t
union
select 0
)
t1
left join t as t2 on t1.id=t2.id-1
where t2.id is null
Many people use an auto-incrementing integer or long value for the Primary Key of their tables, and it is often called ID or MyEntityID or something similar. This column, since it's just an auto-incrementing integer, often has nothing to do with the data being stored itself.
These types of "primary keys" are called surrogate keys. They have no meaning. Many people like these types of IDs to be sequential because it is "aesthetically pleasing", but this is a waste of time and resources. The database could care less about which IDs are being used and which are not.
I would highly suggest you forget trying to do this and just leave the ID column auto-increment. You should also create an index on your table that is made up of those (subset of) columns that can uniquely identify each record in the table (and even consider using this index as your primary key index). In rare cases where you would need to use all columns to accomplish that, that is where an auto-incrementing primary key ID is extremely useful—because it may not be performant to create an index over all columns in the table. Even so, the database engine could care less about this ID (e.g. which ones are in use, are not in use, etc.).
Also consider that an integer-based ID has a maximum total of 4.2 BILLION IDs. It is quite unlikely that you'll exhaust the supply of integer-based IDs in any short amount of time, which further bolsters the argument for why this sort of thing is a waste of time and resources.

Adding string to the primary key?

I want to add some string with the primary key value while creating the table in sql?
Example:
my primary key column should automatically generate values like below:
'EMP101'
'EMP102'
'EMP103'
How to achieve it?
Try this: (For SQL Server 2012)
UPDATE MyTable
SET EMPID = CONCAT('EMP' , EMPID)
Or this: (For SQL Server < 2012)
UPDATE MyTable
SET EMPID = 'EMP' + EMPID
SQLFiddle for SQL Server 2008
SQLFiddle for SQL Server 2012
Since you want to set auto increment in VARCHAR type column you can try this table schema:
CREATE TABLE MyTable
(EMP INT NOT NULL IDENTITY(1000, 1)
,[EMPID] AS 'EMP' + CAST(EMP AS VARCHAR(10)) PERSISTED PRIMARY KEY
,EMPName VARCHAR(20))
;
INSERT INTO MyTable(EMPName) VALUES
('AA')
,('BB')
,('CC')
,('DD')
,('EE')
,('FF')
Output:
| EMP | EMPID | EMPNAME |
----------------------------
| 1000 | EMP1000 | AA |
| 1001 | EMP1001 | BB |
| 1002 | EMP1002 | CC |
| 1003 | EMP1003 | DD |
| 1004 | EMP1004 | EE |
| 1005 | EMP1005 | FF |
See this SQLFiddle
Here you can see EMPID is auto incremented column with Primary key.
Source: HOW TO SET IDENTITY KEY/AUTO INCREMENT ON VARCHAR COLUMN IN SQL SERVER (Thanks to #bvr)
What the rule of thumb is, is that never use meaningful information in primary keys (like Employee Number / Social Security number). Let that just be a plain autoincremented integer. However constant the data seems - it may change at one point (new legislation comes and all SSNs are recalculated).
it seems the only reason you are want to use a non-integer keys is that the key is generated as string concatenation with another column to make it unique.
From a best practice perspective, it is strongly recommended that integer primary keys are used, but often, this guidance is ignored.
May be going through the following posts might be of help:
Should I design a table with a primary key of varchar or int?
SQL primary key: integer vs varchar
You can achieve it at least in two ways:
Generate new id on the fly when you insert a new record
Create INSTEAD OF INSERT trigger that will do that for you
If you have a table schema like this
CREATE TABLE Table1
([emp_id] varchar(12) primary key, [name] varchar(64))
For the first scenario you can use a query
INSERT INTO Table1 (emp_id, name)
SELECT newid, 'Jhon'
FROM
(
SELECT 'EMP' + CONVERT(VARCHAR(9), COALESCE(REPLACE(MAX(emp_id), 'EMP', ''), 0) + 1) newid
FROM Table1 WITH (TABLOCKX, HOLDLOCK)
) q
Here is SQLFiddle demo
For the second scenario you can a trigger like this
CREATE TRIGGER tg_table1_insert ON Table1
INSTEAD OF INSERT AS
BEGIN
DECLARE #max INT
SET #max =
(SELECT COALESCE(REPLACE(MAX(emp_id), 'EMP', ''), 0)
FROM Table1 WITH (TABLOCKX, HOLDLOCK)
)
INSERT INTO Table1 (emp_id, name)
SELECT 'EMP' + CONVERT(VARCHAR(9), #max + ROW_NUMBER() OVER (ORDER BY (SELECT 1))), name
FROM INSERTED
END
Here is SQLFiddle demo
I am looking to do something similar but don't see an answer to my problem here.
I want a primary Key like "JonesB_01" as this is how we want our job number represented in our production system.
--ID | First_Name | Second_Name | Phone | Etc..
-- Bob Jones 9999-999-999
--ID = "Second_Name"+"F"irst Initial+"_(01-99)"
The number 01-99 has been included to allow for multiple instances of a customer with the same surname and first initial. In our industry it's not unusual for the same customer to have work done on multiple occasions but are not repeat business on an ongoing basis. I expect this convention to last a very long time. If we ever exceed it, then I can simply add a third interger.
I want this to auto populate to keep data entry as simple as possible.
I managed to get a solution to work using Excel formulars and a few helper cells but am new to SQL.
--CellA2 = JonesB_01 (=concatenate(D2+E2))
--CellB2 = "Bob"
--CellC2 = "Jones"
--CellD2 = "JonesB" (=if(B2="","",Concatenate(C2,Left(B2)))
--CellE2 = "_01" (=concatenate("_",Text(F2,"00"))
--CellF2 = "1" (=If(D2="","",Countif($D$2:$D2,D2))
Thanks.
SELECT 'EMP' || TO_CHAR(NVL(MAX(TO_NUMBER(SUBSTR(A.EMP_NO, 4,3))), '000')+1) AS NEW_EMP_NO
FROM
(SELECT 'EMP101' EMP_NO
FROM DUAL
UNION ALL
SELECT 'EMP102' EMP_NO
FROM DUAL
UNION ALL
SELECT 'EMP103' EMP_NO
FROM DUAL
) A

Is it possible to use a PG sequence on a per record label?

Does PostgreSQL 9.2+ provide any functionality to make it possible to generate a sequence that is namespaced to a particular value? For example:
.. | user_id | seq_id | body | ...
----------------------------------
- | 4 | 1 | "abc...."
- | 4 | 2 | "def...."
- | 5 | 1 | "ghi...."
- | 5 | 2 | "xyz...."
- | 5 | 3 | "123...."
This would be useful to generate custom urls for the user:
domain.me/username_4/posts/1
domain.me/username_4/posts/2
domain.me/username_5/posts/1
domain.me/username_5/posts/2
domain.me/username_5/posts/3
I did not find anything in the PG docs (regarding sequence and sequence functions) to do this. Are sub-queries in the INSERT statement or with custom PG functions the only other options?
You can use a subquery in the INSERT statement like #Clodoaldo demonstrates. However, this defeats the nature of a sequence as being safe to use in concurrent transactions, it will result in race conditions and eventually duplicate key violations.
You should rather rethink your approach. Just one plain sequence for your table and combine it with user_id to get the sort order you want.
You can always generate the custom URLs with the desired numbers using row_number() with a simple query like:
SELECT format('domain.me/username_%s/posts/%s'
, user_id
, row_number() OVER (PARTITION BY user_id ORDER BY seq_id)
)
FROM tbl;
db<>fiddle here
Old sqlfiddle
Maybe this answer is a little off-piste, but I would consider partitioning the data and giving each user their own partitioned table for posts.
There's a bit of overhead to the setup as you will need triggers for managing the DDL statements for the partitions, but would effectively result in each user having their own table of posts, along with their own sequence with the benefit of being able to treat all posts as one big table also.
General gist of the concept...
psql# CREATE TABLE posts (user_id integer, seq_id integer);
CREATE TABLE
psql# CREATE TABLE posts_001 (seq_id serial) INHERITS (posts);
CREATE TABLE
psql# CREATE TABLE posts_002 (seq_id serial) INHERITS (posts);
CREATE TABLE
psql# INSERT INTO posts_001 VALUES (1);
INSERT 0 1
psql# INSERT INTO posts_001 VALUES (1);
INSERT 0 1
psql# INSERT INTO posts_002 VALUES (2);
INSERT 0 1
psql# INSERT INTO posts_002 VALUES (2);
INSERT 0 1
psql# select * from posts;
user_id | seq_id
---------+--------
1 | 1
1 | 2
2 | 1
2 | 2
(4 rows)
I left out some rather important CHECK constraints in the above setup, make sure you read the docs for how these kinds of setups are used
insert into t values (user_id, seq_id) values
(4, (select coalesce(max(seq_id), 0) + 1 from t where user_id = 4))
Check for a duplicate primary key error in the front end and retry if needed.
Update
Although #Erwin advice is sensible, that is, a single sequence with the ordering in the select query, it can be expensive.
If you don't use a sequence there is no defeat of the nature of the sequence. Also it will not result in a duplicate key violation. To demonstrate it I created a table and made a python script to insert into it. I launched 3 parallel instances of the script inserting as fast as possible. And it just works.
The table must have a primary key on those columns:
create table t (
user_id int,
seq_id int,
primary key (user_id, seq_id)
);
The python script:
#!/usr/bin/env python
import psycopg2, psycopg2.extensions
query = """
begin;
insert into t (user_id, seq_id) values
(4, (select coalesce(max(seq_id), 0) + 1 from t where user_id = 4));
commit;
"""
conn = psycopg2.connect('dbname=cpn user=cpn')
conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_SERIALIZABLE)
cursor = conn.cursor()
for i in range(0, 1000):
while True:
try:
cursor.execute(query)
break
except psycopg2.IntegrityError, e:
print e.pgerror
cursor.execute("rollback;")
cursor.close()
conn.close()
After the parallel run:
select count(*), max(seq_id) from t;
count | max
-------+------
3000 | 3000
Just as expected. I developed at least two applications using that logic and one of then is more than 13 years old and never failed. I concede that if you are Facebook or some other giant then you could have a problem.
Yes:
CREATE TABLE your_table
(
column type DEFAULT NEXTVAL(sequence_name),
...
);
More details here:
http://www.postgresql.org/docs/9.2/static/ddl-default.html

Search for element in array of composite types

Using PostgreSQL 9.0
I have the following table setup
CREATE TABLE person (age integer, last_name text, first_name text, address text);
CREATE TABLE my_people (mperson person[]);
INSERT INTO my_people VALUES(array[ROW(44, 'John', 'Smith', '1234 Test Blvd.')::person]);
Now, i want to be able to write a select statement that can search and compare values of my composite types inside my mperson array column.
Example:
SELECT * FROM my_people WHERE 20 > ANY( (mperson) .age);
However when trying to execute this query i get the following error:
ERROR: column notation .age applied to type person[], which is not a composite type
LINE 1: SELECT mperson FROM my_people WHERE 20 > ANY((mperson).age);
So, you can see i'm trying to test the values of the composite type inside my array.
I know, i'm not supposed to use arrays and composites in my tables, but this best suites our applications requirements.
Also, we have several nested composite arrays, so a generic solution that would allow me to search many levels would be appreciated.
The construction ANY in your case looks redundant. You can write the query that way:
SELECT * FROM my_people WHERE (mperson[1]).age < 20;
Of course, if you have multiple values in this array, that won't work, but you can't get the exact array element the other way neither.
Why do you need arrays at all? You can just write one element of type person per row.
Check also the excellent HStore module, which might better suit your generic needs.
Temporary test setup:
CREATE TEMP TABLE person (age integer, last_name text, first_name text
, address text);
CREATE TEMP TABLE my_people (mperson person[]);
-- test-data, demonstrating 3 different syntax styles:
INSERT INTO my_better_people (mperson)
VALUES
(array[(43, 'Stack', 'Over', '1234 Test Blvd.')::person])
,(array['(44,John,Smith,1234 Test Blvd.)'::person,
'(21,Maria,Smith,1234 Test Blvd.)'::person])
,('{"(33,John,Miller,12 Test Blvd.)",
"(22,Frank,Miller,12 Test Blvd.)",
"(11,Bodi,Miller,12 Test Blvd.)"}');
Call (almost the solution):
SELECT (p).*
FROM (
SELECT unnest(mperson) AS p
FROM my_people) x
WHERE (p).age > 33;
Returns:
age | last_name | first_name | address
-----+-----------+------------+-----------------
43 | Stack | Over | 1234 Test Blvd.
44 | John | Smith | 1234 Test Blvd.
key is the unnest() function, that's available in 9.0.
Your mistake in the example is that you forget about the ARRAY layer in between. unnest() returns one row per base element, then you can access the columns in the complex type as demonstrated.
Brave new world
IF you actually want a whole people instead of an individual that fits the criteria, I propose you add a primary key to the table and proceed as follows:
CREATE TEMP TABLE my_better_people (id serial, mperson person[]);
-- shortcut to populate the new world by emigration from the old world ;)
INSERT INTO my_better_people (mperson)
SELECT mperson FROM my_people;
Find individuals:
SELECT id, (p).*
FROM (
SELECT id, unnest(mperson) AS p
FROM my_better_people) x
WHERE (p).age > 20;
Find whole people (solution):
SELECT *
FROM my_better_people p
WHERE EXISTS (
SELECT 1
FROM (
SELECT id, unnest(mperson) AS p
FROM my_better_people
) x
WHERE (p).age > 20
AND x.id = p.id
);
You can do it without a primary key, but that would be foolish.