postgres: Check pair of columns when inserting new row - sql

I have a table like this:
id | person | supporter | referredby|
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
0 | ABC | DEF | |
1 | ABC | GHI | DEF |
2 | CBA | FED | |
3 | CBA | IHG | FED |
What I'm trying to accomplish is I'd like postgres to reject an INSERT if the value in referredby isn't in the supporter column for a specific person. (null referredby is ok)
For example, with the data above:
4, 'ABC', 'JKL', null: accepted (can be null)
4, 'ABC', 'JKL', 'IHG': rejected (IHG not listed as a supporter for ABC)
4, 'ABC', 'JKL', 'DEF': accepted (DEF is listed as a supporter for ABC)
Maybe a check constraint? I'm not sure how to piece it together

Add a foreign key that references person, supporter. (Needs to be unique key.)
alter table t add constraint cname unique(person, supporter);
alter table t add constraint fk foreign key (person, referredby)
references t (person, supporter);
(ANSI SQL syntax, but probably also supported by Postgresql.)

Related

store Perl hash data in a database

I have written Perl code that parses a text file and uses a hash to tally up the number of times a US State abbreviation appears in each file/record. I end up with something like this.
File: 521
OH => 4
PA => 1
IN => 2
TX => 3
IL => 7
I am struggling to find a way to store such hash results in an SQL database. I am using mariadb. Because the structure of the data itself varies, one file will have some states and the next may have others. For example, one file may contain only a few states, the next may contain a group of completely different states. I am even having trouble conceptualizing the table structure. What would be the best way to store data like this in a database?
There are many possible ways to store the data.
For sake of simplicity see if the following approach will be an acceptable solution for your case. The solution is base on use one table with two indexes based upon id and state columns.
CREATE TABLE IF NOT EXISTS `state_count` (
`id` INT NOT NULL,
`state` VARCHAR(2) NOT NULL,
`count` INT NOT NULL,
INDEX `id` (`id`),
INDEX `state` (`state`)
);
INSERT INTO `state_count`
(`id`,`state`,`count`)
VALUES
('251','OH',4),
('251','PA',1),
('251','IN',2),
('251','TX',3),
('251','IL',7);
Sample SQL SELECT output
MySQL [dbs0897329] > SELECT * FROM state_count;
+-----+-------+-------+
| id | state | count |
+-----+-------+-------+
| 251 | OH | 4 |
| 251 | PA | 1 |
| 251 | IN | 2 |
| 251 | TX | 3 |
| 251 | IL | 7 |
+-----+-------+-------+
5 rows in set (0.000 sec)
MySQL [dbs0897329]> SELECT * FROM state_count WHERE state='OH';
+-----+-------+-------+
| id | state | count |
+-----+-------+-------+
| 251 | OH | 4 |
+-----+-------+-------+
1 row in set (0.000 sec)
MySQL [dbs0897329]> SELECT * FROM state_count WHERE state IN ('OH','TX');
+-----+-------+-------+
| id | state | count |
+-----+-------+-------+
| 251 | OH | 4 |
| 251 | TX | 3 |
+-----+-------+-------+
2 rows in set (0.001 sec)
It's a little unclear in what direction your question goes. But if you want a good relational model to store the data into, that would be three tables. One for the files. One for the states. One for the count of the states in a file. For example:
The tables:
CREATE TABLE file
(id integer
AUTO_INCREMENT,
path varchar(256)
NOT NULL,
PRIMARY KEY (id),
UNIQUE (path));
CREATE TABLE state
(id integer
AUTO_INCREMENT,
abbreviation varchar(2)
NOT NULL,
PRIMARY KEY (id),
UNIQUE (abbreviation));
CREATE TABLE occurrences
(file integer,
state integer,
count integer
NOT NULL,
PRIMARY KEY (file,
state),
FOREIGN KEY (file)
REFERENCES file
(id),
FOREIGN KEY (state)
REFERENCES state
(id),
CHECK (count >= 0));
The data:
INSERT INTO files
(path)
VALUES ('521');
INSERT INTO states
(abbreviation)
VALUES ('OH'),
('PA'),
('IN'),
('TX'),
('IL');
INSERT INTO occurrences
(file,
state,
count)
VALUES (1,
1,
4),
(1,
2,
1),
(1,
3,
2),
(1,
4,
3),
(1,
4,
7);
The states of course would be reused. Fill the table with all 50 and use them. They should not be inserted for every file again.
You can fill occurrences explicitly with a count of 0 for file where the respective state didn't appear, if you want to distinguish between "I know it's 0." and "I don't know the count.", which would then be encoded through the absence of a corresponding row. If you don't want to distinguish that and no row means a count of 0, you can handle that in queries by using outer joins and coalesce() to "translate" to 0.

What if there are no possible primary keys in a table?

I have to design a wide table for a database (timescaleDB, which will create hypertables based on date), but it seems like there are no possible primary keys, even if we are talking about composite keys.
| id | attribute1 | attribute2 | attribute3 | attribute4 | date_time
| ---| ---------- | ---------- | ---------- | ---------- | -------------------
| P1 | A | 20 | NULL | NULL | 2021-01-01 00:00:00
| P1 | B | 10 | NULL | NULL | 2021-01-01 00:00:00
| P1 | NULL | NULL | 200 | 300 | 2021-01-01 00:00:00
| P2 | C | 25 | NULL | NULL | 2021-01-01 00:00:00
| P2 | NULL | NULL | 150 | 400 | 2021-01-01 00:00:00
The problem is that we are scraping data that is describing P1, P2, etc. as a whole, and also that is describing only a part of P1 (A and B are part of P1) P2 (C), etc...
Is there any way to make this work without splitting up the table?
You can follow the design below. The following structure does not store any null values in the database
create table parenttable
(
id int identity,
Name nvarchar(10),
primary key(id)
)
create table childtable
(
id int identity,
parent_id int,
attribute nvarchar(50),
valueattribute nvarchar(50),
date_time datetime,
primary key(id),
foreign key(parent_id)references parenttable
);
insert into parenttable values
('P1'),
('P2')
insert into childtable values
(1,'attribute1','A','2021-01-01 00:00:00'),
(1,'attribute2','20','2021-01-01 00:00:00'),
(1,'attribute1','B','2021-01-01 00:00:00'),
(1,'attribute2','10','2021-01-01 00:00:00'),
(1,'attribute3','200','2021-01-01 00:00:00'),
(1,'attribute4','300','2021-01-01 00:00:00'),
(2,'attribute1','C','2021-01-01 00:00:00'),
(2,'attribute2','25','2021-01-01 00:00:00'),
(2,'attribute3','150','2021-01-01 00:00:00'),
(2,'attribute4','400','2021-01-01 00:00:00')
select *
from parenttable p join childtable c on p.id = c.parent_id
result in dbfiddle: https://dbfiddle.uk
Attribute1 and Attribute2 will always be NULL for P1, those are describing A and B (and they belong to P1). Similarly, attribute3 and attribute4 are going to be always NULL if there is A, B, C, etc. because those attirubtes are describing P1.
There is not enough information in your problem statement to answer your question.
I don't understand the above description, but it's enough to tell me you need to apply functional dependency analysis, and create as many tables as "those are describing" exist.
attribute3 and attribute4 ... are describing P1
That suggests you should have a table representing P1 things, with attribute3 and attribute4 as columns (preferably with meaningful names).
Organize your tables around the things you're modeling.
Look for columns that cannot be NULL for particular things. Those belong in the table depicting one kind of thing.
Then look for columns that might be NULL for a certain kind of thing. Those can be NULL-able columns, or a separate table sharing the same key, with optional cardinality.
There are no other kinds of columns.
Once you've grouped your column into tables and distinguished what's necessary from what's not, you can look over the mandatory columns for a candidate key. There is always such a key, even if it includes all the non-NULL columns. Why? Because two identical rows are indistinguishable from each other. If you think you need two such rows, what you really need is 1 row, and a quantity column (not in the key) indicating how many such exist.

Make field unique per user Id in Sequelize?

Scenario:
User has 3 choices, 1st choice, 2nd choice, 3rd choice. Every choice is saved in db with choice number and user id.
How to put a unique validation so there can't be TWO choice#2 for user#11.
Reading documentation it seems you are only able to put for one column like email.
Is this what you want ?
CREATE UNIQUE INDEX idxu_user_choice ON my_table(choice_id, user_id);
SQL Fiddle
PostgreSQL 9.6 Schema Setup:
CREATE TABLE t
("user_id" int, "choice_id" int, "comment" varchar(8))
;
CREATE UNIQUE INDEX idxu_user_choice ON t(choice_id, user_id);
INSERT INTO t
("user_id", "choice_id", "comment")
VALUES
(1, 1, 'azer'),
(1, 2, 'zsdsfsdf'),
(2, 1, 'sdghlk')
;
Query 1:
INSERT INTO t ("user_id", "choice_id", "comment")
VALUES (2, 2, 'dfgdfgh')
Results:
Query 2:
select * from t
Results:
| user_id | choice_id | comment |
|---------|-----------|----------|
| 1 | 1 | azer |
| 1 | 2 | zsdsfsdf |
| 2 | 1 | sdghlk |
| 2 | 2 | dfgdfgh |
Query 3:
INSERT INTO t ("user_id", "choice_id", "comment")
VALUES (1, 2, 'cvbkll')
Results:
ERROR: duplicate key value violates unique constraint "idxu_user_choice" Detail: Key (choice_id, user_id)=(2, 1) already exists.

Postgresql Sequence vs Serial

I was wondering when it is better to choose sequence, and when it is better
to use serial.
What I want is returning last value after insert using
SELECT LASTVAL();
I read this question
PostgreSQL Autoincrement
I never use serial before.
Check out a nice answer about Sequence vs. Serial.
Sequence will just create sequence of unique numbers. It's not a datatype. It is a sequence. For example:
create sequence testing1;
select nextval('testing1'); -- 1
select nextval('testing1'); -- 2
You can use the same sequence in multiple places like this:
create sequence testing1;
create table table1(id int not null default nextval('testing1'), firstname varchar(20));
create table table2(id int not null default nextval('testing1'), firstname varchar(20));
insert into table1 (firstname) values ('tom'), ('henry');
insert into table2 (firstname) values ('tom'), ('henry');
select * from table1;
| id | firstname |
|----|-----------|
| 1 | tom |
| 2 | henry |
select * from table2;
| id | firstname |
|----|-----------|
| 3 | tom |
| 4 | henry |
Serial is a pseudo datatype. It will create a sequence object. Let's take a look at a straight-forward table (similar to the one you will see in the link).
create table test(field1 serial);
This will cause a sequence to be created along with the table. The sequence name's nomenclature is <tablename>_<fieldname>_seq. The above one is the equivalent of:
create sequence test_field1_seq;
create table test(field1 int not null default nextval('test_field1_seq'));
Also see: http://www.postgresql.org/docs/9.3/static/datatype-numeric.html
You can reuse the sequence that is auto-created by serial datatype, or you may choose to just use one serial/sequence per table.
create table table3(id serial, firstname varchar(20));
create table table4(id int not null default nextval('table3_id_seq'), firstname varchar(20));
(The risk here is that if table3 is dropped and we continue using table3's sequence, we will get an error)
create table table5(id serial, firstname varchar(20));
insert into table3 (firstname) values ('tom'), ('henry');
insert into table4 (firstname) values ('tom'), ('henry');
insert into table5 (firstname) values ('tom'), ('henry');
select * from table3;
| id | firstname |
|----|-----------|
| 1 | tom |
| 2 | henry |
select * from table4; -- this uses sequence created in table3
| id | firstname |
|----|-----------|
| 3 | tom |
| 4 | henry |
select * from table5;
| id | firstname |
|----|-----------|
| 1 | tom |
| 2 | henry |
Feel free to try out an example: http://sqlfiddle.com/#!15/074ac/1
2021 answer using identity
I was wondering when it is better to choose sequence, and when it is better to use serial.
Not an answer to the whole question (only the part quoted above), still I guess it could help further readers. You should not use sequence nor serial, you should rather prefer identity columns:
create table apps (
id integer primary key generated always as identity
);
See this detailed answer: https://stackoverflow.com/a/55300741/978690 (and also https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_serial)

Write SQL script to insert data

In a database that contains many tables, I need to write a SQL script to insert data if it is not exist.
Table currency
| id | Code | lastupdate | rate |
+--------+---------+------------+-----------+
| 1 | USD | 05-11-2012 | 2 |
| 2 | EUR | 05-11-2012 | 3 |
Table client
| id | name | createdate | currencyId|
+--------+---------+------------+-----------+
| 4 | tony | 11-24-2010 | 1 |
| 5 | john | 09-14-2010 | 2 |
Table: account
| id | number | createdate | clientId |
+--------+---------+------------+-----------+
| 7 | 1234 | 12-24-2010 | 4 |
| 8 | 5648 | 12-14-2010 | 5 |
I need to insert to:
currency (id=3, Code=JPY, lastupdate=today, rate=4)
client (id=6, name=Joe, createdate=today, currencyId=Currency with Code 'USD')
account (id=9, number=0910, createdate=today, clientId=Client with name 'Joe')
Problem:
script must check if row exists or not before inserting new data
script must allow us to add a foreign key to the new row where this foreign related to a row already found in database (as currencyId in client table)
script must allow us to add the current datetime to the column in the insert statement (such as createdate in client table)
script must allow us to add a foreign key to the new row where this foreign related to a row inserted in the same script (such as clientId in account table)
Note: I tried the following SQL statement but it solved only the first problem
INSERT INTO Client (id, name, createdate, currencyId)
SELECT 6, 'Joe', '05-11-2012', 1
WHERE not exists (SELECT * FROM Client where id=6);
this query runs without any error but as you can see I wrote createdate and currencyid manually, I need to take currency id from a select statement with where clause (I tried to substitute 1 by select statement but query failed).
This is an example about what I need, in my database, I need this script to insert more than 30 rows in more than 10 tables.
any help
You wrote
I tried to substitute 1 by select statement but query failed
But I wonder why did it fail? What did you try? This should work:
INSERT INTO Client (id, name, createdate, currencyId)
SELECT
6,
'Joe',
current_date,
(select c.id from currency as c where c.code = 'USD') as currencyId
WHERE not exists (SELECT * FROM Client where id=6);
It looks like you can work out if the data exists.
Here is a quick bit of code written in SQL Server / Sybase that I think answers you basic questions:
create table currency(
id numeric(16,0) identity primary key,
code varchar(3) not null,
lastupdated datetime not null,
rate smallint
);
create table client(
id numeric(16,0) identity primary key,
createddate datetime not null,
currencyid numeric(16,0) foreign key references currency(id)
);
insert into currency (code, lastupdated, rate)
values('EUR',GETDATE(),3)
--inserts the date and last allocated identity into client
insert into client(createddate, currencyid)
values(GETDATE(), ##IDENTITY)
go