PostgreSQL two tables with expressions relations - sql

I have a need to do relations between two tables with binary expressions between them. I'll try to clarify. Got two tables
--First
id | Name
1 | First Test 1
2 | First Test 2
--Second
id | Name
1 | Second Test 1
2 | Second Test 2
I want to be able to link the two tables with a logical expressions like the below pseudo code:
First(id=1) => Second(id=1) && (AND) Second(id=2)
Something like one-to-many but with logical operator between all the relations. Is there a straight forward way of doing this?
Thanks in advance,
Julian
UPDATE:
As #Rezu requested - To be able to write a query that will return a text for example:
First Test 1 := Second Test 1 AND Second Test 2
the AND part can be AND, OR, NOT etc.
Hope this clarify the thing that I want to achieve
UPDATE 1:
This is almost the thing I like to achieve. The result query is this:
First Test 1 := Second Test 1
First Test 1 := Second Test 2
First Test 2 := Second Test 3
What I want to achieve is:
First Test 1 := Second Test 1 AND First Test 1 := Second Test 2
First Test 2 := Second Test 3
Hope that explains what is my goal

Basically this is my solution. Maybe there is a better one, but this is what I came up with
create table first (
id serial primary key,
name text
);
insert into first (name) values (
'First Test 1'),('First Test 2');
create table second (
id serial primary key,
name text
);
insert into second (name) values (
'Second Test 1'),('Second Test 2'),('Second Test 3');
create table first_second (
first_id bigint,
second_id bigint,
logical_operator text
);
insert into first_second (first_id, second_id,logical_operator) values (
1,1,''),(1,2,'AND'),(2,3,'');
And the query is:
SELECT first.name || ' := ' ||
string_agg(first_second.logical_operator || ' ' || second.name, ' ')
as name
FROM
first
JOIN first_second ON first_second.first_id = first.id
JOIN second ON first_second.second_id = second.id
Group by first.name

Related

use case for uuid in database

Let's say for example I have a table called test and the data is like that:
id name
1 John
2 Jay
3 Maria
Let's suppose this test gets updated and now the ids the names are for some reason allocated to different id , consider the name column as a unique column, it's just not the pprimary key of test but unique.
Next time I query test it may look like that:
id name
10 John
12 Jay
13 Maria
So in that case the id changed but the name is consistent can be traced back to the previous state of the test table. I believe this is bad practice to change id like that, but I don't have control over this table and this is how some folks handle right now the data. I would like to know if this is a good case for using uuid ? I'm not familiar with the concept of uuid, and how it's best to create something consistent as uniquely identifiable and also fast on search when I want to handle the data changes in this table. I would like to import this table on my end but create a key that is fast and that will not change during data imports.
I feel like the problem you're trying to solve isn't clear.
Problem 1: The id column keeps getting updated. This seems weird so getting to the root of why that is happening seems like the real issue to resolve.
Problem 2: Uniquely identifying rows. You would like to use the id column or a new uuid column to uniquely identify but you've already said you can uniquely identify rows with the name column so what problem are you trying to solve here.
Problem 3: Performance. You're going to get best performance using an indexed integer (preferably primary key) column. Most likely id in this case. uuid won't help with performance.
Problem 4: Data changing on imports. This is likely due to auto increments or initial values set differently in DDL. You need to get a better understanding of what exactly is going wrong with your import.
Problem 5: If you don't have control over the values of the id column how would you be able to add your own uuid?
uuid is just a way of creating a unique value.
Oracle has a function to create uuid random_uuid().
This is an XY-problem.
You have data in your table with a unique key in a given data type and when it gets reloaded then the unique key is being regenerated so that all the data gets given new unique values.
The data type you use does not matter; the issue is with the process of reloading the data and regenerating the unique keys. If you use a different data type and the unique keys are still regenerated then you have exactly the same problem.
Fix that problem first.
Putting aside the reasons for this question and does it make sense or not. If I got it right it is about generation of a unique key from NAME which is unique itself.
If that is the case then you could create your on function to do the job:
CREATE OR REPLACE FUNCTION NAME_2_ID(p_name VARCHAR2) RETURN NUMBER AS
BEGIN
--
Declare
mRet Number(16);
mAlpha VarChar2(64);
mWrk Number(16) := 0;
mANum VarChar2(4000) := '';
--
Begin
IF p_name Is Null Then
mRet := 0;
GOTO End_It;
END IF;
--
mAlpha := ' ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'',.!?"()[]{}';
-- ---------------- Replacing Alpha To Numeric -----------------------------------------------------------------------------------------------
For i In 1 .. Length(p_name) Loop
mANum := mANum || SubStr(p_name, i, 1) || To_Char(InStr(mAlpha, Upper(SubStr(p_name, i, 1)))) || '~';
mWrk := mWrk + InStr(mAlpha, Upper(SubStr(p_name, i, 1)));
End Loop;
mRet := mWrk * Length(mANum);
<<End_It>>
RETURN(mRet);
End;
END NAME_2_ID;
As your ID column in TEST table is changing like in sample data:
WITH
test_1 AS
(
Select 1 "ID", 'John' "A_NAME" From Dual Union All
Select 2 "ID", 'Jay' "A_NAME" From Dual Union All
Select 3 "ID", 'Maria' "A_NAME" From Dual
),
test_2 AS
(
Select 10 "ID", 'John' "A_NAME" From Dual Union All
Select 12 "ID", 'Jay' "A_NAME" From Dual Union All
Select 13 "ID", 'Maria' "A_NAME" From Dual
)
... you can get the same ID_2 whenever you query the table (if name didn't
change) ...
Select
ID,
A_NAME,
NAME_2_ID(A_NAME) "ID_2"
From
test_1
/*
ID A_NAME ID_2
---------- ------ ----------
1 John 765
2 Jay 429
3 Maria 846
*/
-- -------------------------
... ... ...
From
test_2
/*
ID A_NAME ID_2
---------- ------ ----------
10 John 765
12 Jay 429
13 Maria 846
*/

Why multiset union is not working when I'm trying to concatenate a null with some number in plsql?

So i have two nested tables and i want to make a new one with the elements from both of them but the first nested table have an null value and the second one an number and i want the result to be the number in the second one but he print the null value. It is possible to make a union between a null and an number with multiset union ?
To answer your question, yes, it is possible to "make a union between a null and an number with multiset union". But what you end up with is **two entries in the nested table:
SQL> update test
2 set marks = numberlist(null) multiset union all numberlist(42)
3 where id_std = 1
4 /
SQL> select id_std
2 , t2.column_value as mark
3 from test t1
4 , table(t1.marks) t2
5 /
ID_STD MARK
------ ----
1
1 42
SQL>
I suspect this affect is actually what you're complaining about. However, the null mark is still a valid entry. If you want to overwrite it you need to provide different logic.

Is it possible to use a PG sequence on a per record label?

Does PostgreSQL 9.2+ provide any functionality to make it possible to generate a sequence that is namespaced to a particular value? For example:
.. | user_id | seq_id | body | ...
----------------------------------
- | 4 | 1 | "abc...."
- | 4 | 2 | "def...."
- | 5 | 1 | "ghi...."
- | 5 | 2 | "xyz...."
- | 5 | 3 | "123...."
This would be useful to generate custom urls for the user:
domain.me/username_4/posts/1
domain.me/username_4/posts/2
domain.me/username_5/posts/1
domain.me/username_5/posts/2
domain.me/username_5/posts/3
I did not find anything in the PG docs (regarding sequence and sequence functions) to do this. Are sub-queries in the INSERT statement or with custom PG functions the only other options?
You can use a subquery in the INSERT statement like #Clodoaldo demonstrates. However, this defeats the nature of a sequence as being safe to use in concurrent transactions, it will result in race conditions and eventually duplicate key violations.
You should rather rethink your approach. Just one plain sequence for your table and combine it with user_id to get the sort order you want.
You can always generate the custom URLs with the desired numbers using row_number() with a simple query like:
SELECT format('domain.me/username_%s/posts/%s'
, user_id
, row_number() OVER (PARTITION BY user_id ORDER BY seq_id)
)
FROM tbl;
db<>fiddle here
Old sqlfiddle
Maybe this answer is a little off-piste, but I would consider partitioning the data and giving each user their own partitioned table for posts.
There's a bit of overhead to the setup as you will need triggers for managing the DDL statements for the partitions, but would effectively result in each user having their own table of posts, along with their own sequence with the benefit of being able to treat all posts as one big table also.
General gist of the concept...
psql# CREATE TABLE posts (user_id integer, seq_id integer);
CREATE TABLE
psql# CREATE TABLE posts_001 (seq_id serial) INHERITS (posts);
CREATE TABLE
psql# CREATE TABLE posts_002 (seq_id serial) INHERITS (posts);
CREATE TABLE
psql# INSERT INTO posts_001 VALUES (1);
INSERT 0 1
psql# INSERT INTO posts_001 VALUES (1);
INSERT 0 1
psql# INSERT INTO posts_002 VALUES (2);
INSERT 0 1
psql# INSERT INTO posts_002 VALUES (2);
INSERT 0 1
psql# select * from posts;
user_id | seq_id
---------+--------
1 | 1
1 | 2
2 | 1
2 | 2
(4 rows)
I left out some rather important CHECK constraints in the above setup, make sure you read the docs for how these kinds of setups are used
insert into t values (user_id, seq_id) values
(4, (select coalesce(max(seq_id), 0) + 1 from t where user_id = 4))
Check for a duplicate primary key error in the front end and retry if needed.
Update
Although #Erwin advice is sensible, that is, a single sequence with the ordering in the select query, it can be expensive.
If you don't use a sequence there is no defeat of the nature of the sequence. Also it will not result in a duplicate key violation. To demonstrate it I created a table and made a python script to insert into it. I launched 3 parallel instances of the script inserting as fast as possible. And it just works.
The table must have a primary key on those columns:
create table t (
user_id int,
seq_id int,
primary key (user_id, seq_id)
);
The python script:
#!/usr/bin/env python
import psycopg2, psycopg2.extensions
query = """
begin;
insert into t (user_id, seq_id) values
(4, (select coalesce(max(seq_id), 0) + 1 from t where user_id = 4));
commit;
"""
conn = psycopg2.connect('dbname=cpn user=cpn')
conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_SERIALIZABLE)
cursor = conn.cursor()
for i in range(0, 1000):
while True:
try:
cursor.execute(query)
break
except psycopg2.IntegrityError, e:
print e.pgerror
cursor.execute("rollback;")
cursor.close()
conn.close()
After the parallel run:
select count(*), max(seq_id) from t;
count | max
-------+------
3000 | 3000
Just as expected. I developed at least two applications using that logic and one of then is more than 13 years old and never failed. I concede that if you are Facebook or some other giant then you could have a problem.
Yes:
CREATE TABLE your_table
(
column type DEFAULT NEXTVAL(sequence_name),
...
);
More details here:
http://www.postgresql.org/docs/9.2/static/ddl-default.html

Factor (string) to Numeric in PostgreSQL

Similar to this, is it possible to convert a String field to Numeric in PostgreSQL. For instance,
create table test (name text);
insert into test (name) values ('amy');
insert into test (name) values ('bob');
insert into test (name) values ('bob');
insert into test (name) values ('celia');
and add a field that is
name | num
-------+-----
amy | 1
bob | 2
bob | 2
celia | 3
The most effective "hash"-function of all is a serial primary key - giving you a unique number like you wished for in the question.
I also deal with duplicates in this demo:
CREATE TEMP TABLE string (
string_id serial PRIMARY KEY
,string text NOT NULL UNIQUE -- no dupes
,ct int NOT NULL DEFAULT 1 -- count instead of dupe rows
);
Then you would enter new strings like this:
(Data-modifying CTE requires PostgreSQL 9.1 or later.)
WITH x AS (SELECT 'abc'::text AS nu)
, y AS (
UPDATE string s
SET ct = ct + 1
FROM x
WHERE s.string = x.nu
RETURNING TRUE
)
INSERT INTO string (string)
SELECT nu
FROM x
WHERE NOT EXISTS (SELECT 1 FROM y);
If the string nu already exists, the count (ct) is increased by 1. If not, a new row is inserted, starting with a count of 1.
The UNIQUE also adds an index on the column string.string automatically, which leads to optimal performance for this query.
Add additional logic (triggers ?) for UPDATE / DELETE to make this bullet-proof - if needed.
Note, there is a super-tiny race condition here, if two concurrent transactions try to add the same string at the same moment in time. To be absolutely sure, you could use SERIALIZABLE transactions. More info and links under this this related question.
Live demo at sqlfiddle.
How 'bout a hash, such as md5, of name?
create table test (name text, hash text);
-- later
update test set hash = md5(name);
If you need to convert that md5 text to a number: Hashing a String to a Numeric Value in PostgresSQL
If they are all single characters, you could do this:
ALTER TABLE test ADD COLUMN num int;
UPDATE test SET num = ascii(name);
Though that would only return the character for the first letter if the string was more than a single character.
The exact case shown in your request can be produced with the dense_rank window function:
regress=# SELECT name, dense_rank() OVER (ORDER BY name) FROM test;
name | dense_rank
-------+------------
amy | 1
bob | 2
bob | 2
celia | 3
(4 rows)
so if you were adding a number for each row, you'd be able to do something like:
ALTER TABLE test ADD COLUMN some_num integer;
WITH gen(gen_name, gen_num) AS
(SELECT name, dense_rank() OVER (ORDER BY name) FROM test GROUP BY name)
UPDATE test SET some_num = gen_num FROM gen WHERE name = gen_name;
ALTER TABLE test ALTER COLUMN some_num SET NOT NULL;
however I think it's much more sensible to use a hash or to assign generated keys. I'm just showing that your example can be achieved.
The biggest problem with this approach is that inserting new data is a pain. It's a ranking (like your example shows) so if you INSERT INTO test (name) VALUES ('billy'); then the ranking changes.

Search for element in array of composite types

Using PostgreSQL 9.0
I have the following table setup
CREATE TABLE person (age integer, last_name text, first_name text, address text);
CREATE TABLE my_people (mperson person[]);
INSERT INTO my_people VALUES(array[ROW(44, 'John', 'Smith', '1234 Test Blvd.')::person]);
Now, i want to be able to write a select statement that can search and compare values of my composite types inside my mperson array column.
Example:
SELECT * FROM my_people WHERE 20 > ANY( (mperson) .age);
However when trying to execute this query i get the following error:
ERROR: column notation .age applied to type person[], which is not a composite type
LINE 1: SELECT mperson FROM my_people WHERE 20 > ANY((mperson).age);
So, you can see i'm trying to test the values of the composite type inside my array.
I know, i'm not supposed to use arrays and composites in my tables, but this best suites our applications requirements.
Also, we have several nested composite arrays, so a generic solution that would allow me to search many levels would be appreciated.
The construction ANY in your case looks redundant. You can write the query that way:
SELECT * FROM my_people WHERE (mperson[1]).age < 20;
Of course, if you have multiple values in this array, that won't work, but you can't get the exact array element the other way neither.
Why do you need arrays at all? You can just write one element of type person per row.
Check also the excellent HStore module, which might better suit your generic needs.
Temporary test setup:
CREATE TEMP TABLE person (age integer, last_name text, first_name text
, address text);
CREATE TEMP TABLE my_people (mperson person[]);
-- test-data, demonstrating 3 different syntax styles:
INSERT INTO my_better_people (mperson)
VALUES
(array[(43, 'Stack', 'Over', '1234 Test Blvd.')::person])
,(array['(44,John,Smith,1234 Test Blvd.)'::person,
'(21,Maria,Smith,1234 Test Blvd.)'::person])
,('{"(33,John,Miller,12 Test Blvd.)",
"(22,Frank,Miller,12 Test Blvd.)",
"(11,Bodi,Miller,12 Test Blvd.)"}');
Call (almost the solution):
SELECT (p).*
FROM (
SELECT unnest(mperson) AS p
FROM my_people) x
WHERE (p).age > 33;
Returns:
age | last_name | first_name | address
-----+-----------+------------+-----------------
43 | Stack | Over | 1234 Test Blvd.
44 | John | Smith | 1234 Test Blvd.
key is the unnest() function, that's available in 9.0.
Your mistake in the example is that you forget about the ARRAY layer in between. unnest() returns one row per base element, then you can access the columns in the complex type as demonstrated.
Brave new world
IF you actually want a whole people instead of an individual that fits the criteria, I propose you add a primary key to the table and proceed as follows:
CREATE TEMP TABLE my_better_people (id serial, mperson person[]);
-- shortcut to populate the new world by emigration from the old world ;)
INSERT INTO my_better_people (mperson)
SELECT mperson FROM my_people;
Find individuals:
SELECT id, (p).*
FROM (
SELECT id, unnest(mperson) AS p
FROM my_better_people) x
WHERE (p).age > 20;
Find whole people (solution):
SELECT *
FROM my_better_people p
WHERE EXISTS (
SELECT 1
FROM (
SELECT id, unnest(mperson) AS p
FROM my_better_people
) x
WHERE (p).age > 20
AND x.id = p.id
);
You can do it without a primary key, but that would be foolish.