Is it possible to use a PG sequence on a per record label? - sql

Does PostgreSQL 9.2+ provide any functionality to make it possible to generate a sequence that is namespaced to a particular value? For example:
.. | user_id | seq_id | body | ...
----------------------------------
- | 4 | 1 | "abc...."
- | 4 | 2 | "def...."
- | 5 | 1 | "ghi...."
- | 5 | 2 | "xyz...."
- | 5 | 3 | "123...."
This would be useful to generate custom urls for the user:
domain.me/username_4/posts/1
domain.me/username_4/posts/2
domain.me/username_5/posts/1
domain.me/username_5/posts/2
domain.me/username_5/posts/3
I did not find anything in the PG docs (regarding sequence and sequence functions) to do this. Are sub-queries in the INSERT statement or with custom PG functions the only other options?

You can use a subquery in the INSERT statement like #Clodoaldo demonstrates. However, this defeats the nature of a sequence as being safe to use in concurrent transactions, it will result in race conditions and eventually duplicate key violations.
You should rather rethink your approach. Just one plain sequence for your table and combine it with user_id to get the sort order you want.
You can always generate the custom URLs with the desired numbers using row_number() with a simple query like:
SELECT format('domain.me/username_%s/posts/%s'
, user_id
, row_number() OVER (PARTITION BY user_id ORDER BY seq_id)
)
FROM tbl;
db<>fiddle here
Old sqlfiddle

Maybe this answer is a little off-piste, but I would consider partitioning the data and giving each user their own partitioned table for posts.
There's a bit of overhead to the setup as you will need triggers for managing the DDL statements for the partitions, but would effectively result in each user having their own table of posts, along with their own sequence with the benefit of being able to treat all posts as one big table also.
General gist of the concept...
psql# CREATE TABLE posts (user_id integer, seq_id integer);
CREATE TABLE
psql# CREATE TABLE posts_001 (seq_id serial) INHERITS (posts);
CREATE TABLE
psql# CREATE TABLE posts_002 (seq_id serial) INHERITS (posts);
CREATE TABLE
psql# INSERT INTO posts_001 VALUES (1);
INSERT 0 1
psql# INSERT INTO posts_001 VALUES (1);
INSERT 0 1
psql# INSERT INTO posts_002 VALUES (2);
INSERT 0 1
psql# INSERT INTO posts_002 VALUES (2);
INSERT 0 1
psql# select * from posts;
user_id | seq_id
---------+--------
1 | 1
1 | 2
2 | 1
2 | 2
(4 rows)
I left out some rather important CHECK constraints in the above setup, make sure you read the docs for how these kinds of setups are used

insert into t values (user_id, seq_id) values
(4, (select coalesce(max(seq_id), 0) + 1 from t where user_id = 4))
Check for a duplicate primary key error in the front end and retry if needed.
Update
Although #Erwin advice is sensible, that is, a single sequence with the ordering in the select query, it can be expensive.
If you don't use a sequence there is no defeat of the nature of the sequence. Also it will not result in a duplicate key violation. To demonstrate it I created a table and made a python script to insert into it. I launched 3 parallel instances of the script inserting as fast as possible. And it just works.
The table must have a primary key on those columns:
create table t (
user_id int,
seq_id int,
primary key (user_id, seq_id)
);
The python script:
#!/usr/bin/env python
import psycopg2, psycopg2.extensions
query = """
begin;
insert into t (user_id, seq_id) values
(4, (select coalesce(max(seq_id), 0) + 1 from t where user_id = 4));
commit;
"""
conn = psycopg2.connect('dbname=cpn user=cpn')
conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_SERIALIZABLE)
cursor = conn.cursor()
for i in range(0, 1000):
while True:
try:
cursor.execute(query)
break
except psycopg2.IntegrityError, e:
print e.pgerror
cursor.execute("rollback;")
cursor.close()
conn.close()
After the parallel run:
select count(*), max(seq_id) from t;
count | max
-------+------
3000 | 3000
Just as expected. I developed at least two applications using that logic and one of then is more than 13 years old and never failed. I concede that if you are Facebook or some other giant then you could have a problem.

Yes:
CREATE TABLE your_table
(
column type DEFAULT NEXTVAL(sequence_name),
...
);
More details here:
http://www.postgresql.org/docs/9.2/static/ddl-default.html

Related

Transform Row Values to Column Names

I have a table of customer contacts and their role. Simplified example below.
customer | role | userid
----------------------------
1 | Support | 123
1 | Support | 456
1 | Procurement | 567
...
desired output
customer | Support1 | Support2 | Support3 | Support4 | Procurement1 | Procurement2
-----------------------------------------------------------------------------------
1 | 123 | 456 | null | null | 567 | null
2 | 123 | 456 | 12333 | 45776 | 888 | 56723
So dynamically create number of required columns based on how many user are in that role. It's a small number of roles. Also I can assume max 5 user in that same role. Which means worst case I need to generate 5 columns for each role. The userids don't need to be in any particular order.
My current approach is getting 1 userid per role/customer. Then a second query pulls another id that wasn't part of first results set. And so on. But that way I have to statically create 5 queries. It works. But I was wondering whether there is a more efficient way? Dynamically creating needed columns.
Example of pulling one user per role:
SELECT customer,role,
(SELECT top 1 userid
FROM temp as tmp1
where tmp1.customer=tmp2.customer and tmp1.role=tmp2.role
) as userid
FROM temp as tmp2
group by customer,role
order by customer,role
SQL create with dummy data
create table temp
(
customer int,
role nvarchar(20),
userid int
)
insert into temp values (1,'Support',123)
insert into temp values (1,'Support',456)
insert into temp values (1,'Procurement',567)
insert into temp values (2,'Support',123)
insert into temp values (2,'Support',456)
insert into temp values (2,'Procurement',888)
insert into temp values (2,'Support',12333)
insert into temp values (2,'Support',45776)
insert into temp values (2,'Procurement',56723)
You may need to adapt your approach slightly if you want to avoid getting into the realm of programming user defined table functions (which is what you would need in order to generate columns dynamically). You don't mention which SQL database variant you are using (SQL Server, PostgreSQL, ?). I'm going to make the assumption that it supports some form of string aggregation feature (they pretty much all do), but the syntax for doing this will vary, so you will probably have to adjust the code to your circumstances. You mention that the number of roles is small (5-ish?). The proposed solution is to generate a comma-separated list of user ids, one for each role, using common table expressions (CTEs) and the LISTAGG (variously named STRING_AGG, GROUP_CONCAT, etc. in other databases) function.
WITH tsupport
AS (SELECT customer,
Listagg(userid, ',') AS "Support"
FROM temp
WHERE ROLE = 'Support'
GROUP BY customer),
tprocurement
AS (SELECT customer,
Listagg(userid, ',') AS "Procurement"
FROM temp
WHERE ROLE = 'Procurement'
GROUP BY customer)
--> tnextrole...
--> AS (SELECT ... for additional roles
--> Listagg...
SELECT a.customer,
"Support",
"Procurement"
--> "Next Role" etc.
FROM tsupport a
JOIN tprocurement b
ON a.customer = b.customer
--> JOIN tNextRole ...
Fiddle is here with a result that appears as below based on your dummy data:

How to sort the GUID or Unique identifer column in SQL

I want to sort the newID column by using ORDER BY, but when I try to order by the id is getting changed each and every time when I execute the query.
I have tired using the CAST operator for converting to VARCHAR and try to sort it. But it is not working.
declare #temp table
(
id int identity(1,1),
newID UNIQUEIDENTIFIER
)
insert into #temp
SELECT NEWID()
insert into #temp
SELECT NEWID()
insert into #temp
SELECT NEWID()
insert into #temp
SELECT NEWID()
select * from #temp
select * from #temp order by cast(newID as varchar(40)) asc
id newID
1 9653de71-34c2-4409-bcee-6809e170e197
2 3f3e7ab8-a516-4dd2-a04b-31feeac8fdea
3 1f1d38b8-3c31-4479-ba48-b71ce8525ea3
4 33f1e2b9-f4c3-4e57-9267-ff729a326318
id newID
3 1f1d38b8-3c31-4479-ba48-b71ce8525ea3
4 33f1e2b9-f4c3-4e57-9267-ff729a326318
2 3f3e7ab8-a516-4dd2-a04b-31feeac8fdea
1 9653de71-34c2-4409-bcee-6809e170e197
The second table also I need to get sorted same like the first table when using ORDER BY statement.
What I think you are trying to do:
I think you want to return your newID values, sorted in ascending order with an incrementing row number in the id column.
What I think you are misunderstanding:
The ID of a row does not need to be in any particular order, it just needs to be unique. If you are using the incrementing integer value of your id column as an identifier elsewhere in your solution, then you do not need to worry about the order. This is important because if you were to insert a new newID value that when sorted fell between two existing newID values, some of the id values would have to change to retain the ordering. This would break any relationships based on the id value.
It is important to note here that the int identity(1,1) value automatically increments (not always by 1) for each row as it is inserted. If you insert your data 'out of order' then the value will also be 'out of order'. I think you are misunderstanding what this functionality is for. In short, it doesn't do what you want it to.
You also can order a uniqueidentifier column as is. You will be getting an error because you have called the column newID, which is a reserved keyword within SQL Server. If you want to keep this name (which I suggest you don't), you will need to reference it within square brackets: order by [newID]. Bear in mind that the 'correct' ordering of a uniqueidentifier is not the same as the alphabetical ordering of the value you see on the screen, much like how the numeric ordering of 1, 2, 3, 10, 11, 12 is different to alphabetically ordering the same values as 1, 10, 11, 12, 2, 3.
How to actually get to your desired output:
If on the off chance you really do just want to get the row number of the newID value that is in your table, you can do this with the row_number windowed function:
declare #temp table(id int identity(1,1)
,newID UNIQUEIDENTIFIER
);
insert into #temp(newID) values
('9653de71-34c2-4409-bcee-6809e170e197')
,('3f3e7ab8-a516-4dd2-a04b-31feeac8fdea')
,('1f1d38b8-3c31-4479-ba48-b71ce8525ea3')
,('33f1e2b9-f4c3-4e57-9267-ff729a326318');
-- Showing the int identity, which increments as new rows are added
select *
from #temp
order by [newID];
-- Using the row_number function to generate the id value at runtime
select row_number() over (order by [newID]) as id
,[newID]
from #temp
order by [newID];
Outputs
Using int identity:
+----+--------------------------------------+
| id | newID |
+----+--------------------------------------+
| 2 | 3F3E7AB8-A516-4DD2-A04B-31FEEAC8FDEA |
| 1 | 9653DE71-34C2-4409-BCEE-6809E170E197 |
| 3 | 1F1D38B8-3C31-4479-BA48-B71CE8525EA3 |
| 4 | 33F1E2B9-F4C3-4E57-9267-FF729A326318 |
+----+--------------------------------------+
and using row_number:
+----+--------------------------------------+
| id | newID |
+----+--------------------------------------+
| 1 | 3F3E7AB8-A516-4DD2-A04B-31FEEAC8FDEA |
| 2 | 9653DE71-34C2-4409-BCEE-6809E170E197 |
| 3 | 1F1D38B8-3C31-4479-BA48-B71CE8525EA3 |
| 4 | 33F1E2B9-F4C3-4E57-9267-FF729A326318 |
+----+--------------------------------------+
NEWSEQUENTIALID() - This will not generate random sequence id for the GUID. We can use this one instead of Newid().
CREATE TABLE Product_A
(
ID uniqueidentifier default NEWSEQUENTIALID(),
productname int
)
Insert Into Product_A(productname) values(1)
Insert Into Product_A(productname) values(2)
Insert Into Product_A(productname) values(3)
Select * from Product_A
Select * from Product_A order by ID
I have used like this, but i want to use the newsequentialid in column wise not has default values in table creation. But it is not possible to use like that. Any suggestion to convert the newid() to newsequentialid because we can sort after the table insertion
Basically in our project we are using the GUID for each transaction, sometimes it can be null also. we are using has unique identifier for the column newid. after the inserting into a table, we want to get the same order has it was inserted in the table. But when we do order by for GUID column, it is randomly sorting. I want the same order as it was inserted in the table

Cassandra: Update multiple rows with different values

Hi I have similar table in Cassandra:
CREATE TABLE TestTable( id text,
group text,
date text,
user text,
dept text,
orderby int,
files list<text>,
users list<text>,
family_memebrs list<frozen <member>>,
PRIMARY KEY ((id)));'
CREATE INDEX on TestTable (user);
CREATE INDEX on TestTable (dept);
CREATE INDEX on TestTable (group);
CREATE INDEX on TestTable (date);
Id | OrderBy
:---- | :----
101 | 1
102 | 2
105 | 3
I want to change existing order by for following ids 105,102,103 in same order. i.e., (105, 1) (102, 2) (103, 3). I'm new to Cassandra, please help me. I think it is possible in sql byusing rownum and join.
I'm new to Cassandra
I can tell. The first clue, was the order of your results. With id as your sole PRIMARY KEY (making it your partition key) your results would never come back sorted like that. This is how they should be sorted:
aploetz#cqlsh:stackoverflow> SELECT id,orderby,token(id) FROM testtable ;
id | orderby | system.token(id)
-----+---------+---------------------
102 | 2 | -963541259029995480
105 | 3 | 2376737131193407616
101 | 1 | 4965004472028601333
(3 rows)
Unbound queries always return results sorted by the hashed token value of the partition key. I have run the token() function on your partition key (id) to show this.
I want to change existing order by for following ids 105,102,103 in same order. i.e., (105, 1) (102, 2) (103, 3).
If all you need to do is change the values in the orderby column, that's easy:
aploetz#cqlsh:stackoverflow> INSERT INTO testtable(id,orderby) VALUES ('101',3);
aploetz#cqlsh:stackoverflow> INSERT INTO testtable(id,orderby) VALUES ('102',2);
aploetz#cqlsh:stackoverflow> INSERT INTO testtable(id,orderby) VALUES ('105',1);
aploetz#cqlsh:stackoverflow> SELECT id,orderby,token(id) FROM testtable ;
id | orderby | system.token(id)
-----+---------+---------------------
102 | 2 | -963541259029995480
105 | 1 | 2376737131193407616
101 | 3 | 4965004472028601333
(3 rows)
As Cassandra PRIMARY KEYs are unique, simply INSERTing a new non-key column value for that key changes orderby.
Now if you want to actually be able to sort your results by the orderby column, that's another issue entirely, and cannot be solved with your current model.
If that's what you really want to do, then you'll need a new table with a different PRIMARY KEY definition. So I'll create the same table with two changes: I'll name it testtable_by_group, and I'll use a composite PRIMARY KEY of PRIMARY KEY (group,orderby,id)). Now I can query for a specific group "group1" and see the results sorted.
aploetz#cqlsh:stackoverflow> CREATE TABLE testtable_by_group (group text,id text,orderby int,PRIMARY KEY (group,orderby,id));
aploetz#cqlsh:stackoverflow> INSERT INTO testtable_by_group(group,id,orderby) VALUES ('group1','101',3);
aploetz#cqlsh:stackoverflow> INSERT INTO testtable_by_group(group,id,orderby) VALUES ('group1','102',2);
aploetz#cqlsh:stackoverflow> INSERT INTO testtable_by_group(group,id,orderby) VALUES ('group1','105',1);
aploetz#cqlsh:stackoverflow> SELECT group,id,orderby,token(group) FROM testtable_by_group WHERE group='group1';
group | id | orderby | system.token(group)
--------+-----+---------+----------------------
group1 | 105 | 1 | -2413872665919611707
group1 | 102 | 2 | -2413872665919611707
group1 | 101 | 3 | -2413872665919611707
(3 rows)
In this way, group is the new partition key. orderby is the first clustering key, so your rows within group are automatically sorted by it. id is on the end to ensure uniqueness, if any two rows have the same orderby.
Note that I left the token() function in the result set, but that I ran it on the new partition key (group). As you can see, the key of group1 is hashed to the same token for all 3 rows, which means that in a multi-node environment all 3 rows will be stored together. This can create a "hotspot" in your cluster, where some nodes have more data than others. That's why a good PRIMARY KEY definition ensures both query satisfaction and data distribution.
I wrote an article for DataStax on this topic a while back. Give it a read, and it should help you out: http://www.datastax.com/dev/blog/we-shall-have-order

SQL Multiple Row Insert w/ multiple selects from different tables

I am trying to do a multiple insert based on values that I am pulling from a another table. Basically I need to give all existing users access to a service that previously had access to a different one. Table1 will take the data and run a job to do this.
INSERT INTO Table1 (id, serv_id, clnt_alias_id, serv_cat_rqst_stat)
SELECT
(SELECT Max(id) + 1
FROM Table1 ),
'33', --The new service id
clnt_alias_id,
'PI' --The code to let the job know to grant access
FROM TABLE2,
WHERE serv_id = '11' --The old service id
I am getting a Primary key constraint error on id.
Please help.
Thanks,
Colin
This query is impossible. The max(id) sub-select will evaluate only ONCE and return the same value for all rows in the parent query:
MariaDB [test]> create table foo (x int);
MariaDB [test]> insert into foo values (1), (2), (3);
MariaDB [test]> select *, (select max(x)+1 from foo) from foo;
+------+----------------------------+
| x | (select max(x)+1 from foo) |
+------+----------------------------+
| 1 | 4 |
| 2 | 4 |
| 3 | 4 |
+------+----------------------------+
3 rows in set (0.04 sec)
You will have to run your query multiple times, once for each record you're trying to copy. That way the max(id) will get the ID from the previous query.
Is there a requirement that Table1.id be incremental ints? If not, just add the clnt_alias_id to Max(id). This is a nasty workaround though, and you should really try to get that column's type changed to auto_increment, like Marc B suggested.

Factor (string) to Numeric in PostgreSQL

Similar to this, is it possible to convert a String field to Numeric in PostgreSQL. For instance,
create table test (name text);
insert into test (name) values ('amy');
insert into test (name) values ('bob');
insert into test (name) values ('bob');
insert into test (name) values ('celia');
and add a field that is
name | num
-------+-----
amy | 1
bob | 2
bob | 2
celia | 3
The most effective "hash"-function of all is a serial primary key - giving you a unique number like you wished for in the question.
I also deal with duplicates in this demo:
CREATE TEMP TABLE string (
string_id serial PRIMARY KEY
,string text NOT NULL UNIQUE -- no dupes
,ct int NOT NULL DEFAULT 1 -- count instead of dupe rows
);
Then you would enter new strings like this:
(Data-modifying CTE requires PostgreSQL 9.1 or later.)
WITH x AS (SELECT 'abc'::text AS nu)
, y AS (
UPDATE string s
SET ct = ct + 1
FROM x
WHERE s.string = x.nu
RETURNING TRUE
)
INSERT INTO string (string)
SELECT nu
FROM x
WHERE NOT EXISTS (SELECT 1 FROM y);
If the string nu already exists, the count (ct) is increased by 1. If not, a new row is inserted, starting with a count of 1.
The UNIQUE also adds an index on the column string.string automatically, which leads to optimal performance for this query.
Add additional logic (triggers ?) for UPDATE / DELETE to make this bullet-proof - if needed.
Note, there is a super-tiny race condition here, if two concurrent transactions try to add the same string at the same moment in time. To be absolutely sure, you could use SERIALIZABLE transactions. More info and links under this this related question.
Live demo at sqlfiddle.
How 'bout a hash, such as md5, of name?
create table test (name text, hash text);
-- later
update test set hash = md5(name);
If you need to convert that md5 text to a number: Hashing a String to a Numeric Value in PostgresSQL
If they are all single characters, you could do this:
ALTER TABLE test ADD COLUMN num int;
UPDATE test SET num = ascii(name);
Though that would only return the character for the first letter if the string was more than a single character.
The exact case shown in your request can be produced with the dense_rank window function:
regress=# SELECT name, dense_rank() OVER (ORDER BY name) FROM test;
name | dense_rank
-------+------------
amy | 1
bob | 2
bob | 2
celia | 3
(4 rows)
so if you were adding a number for each row, you'd be able to do something like:
ALTER TABLE test ADD COLUMN some_num integer;
WITH gen(gen_name, gen_num) AS
(SELECT name, dense_rank() OVER (ORDER BY name) FROM test GROUP BY name)
UPDATE test SET some_num = gen_num FROM gen WHERE name = gen_name;
ALTER TABLE test ALTER COLUMN some_num SET NOT NULL;
however I think it's much more sensible to use a hash or to assign generated keys. I'm just showing that your example can be achieved.
The biggest problem with this approach is that inserting new data is a pain. It's a ranking (like your example shows) so if you INSERT INTO test (name) VALUES ('billy'); then the ranking changes.