Use DISTINCT ON with empty n:n relations - sql

I'm a new user of PostgreSQL, trying to use DISTINCT ON but I can't reach my goal.
Here's a brief sketch of my database :
files with versioning
fields with model (for form generation purpose)
n:n relations between files' versions and fields
I would like to retrieve a whole set of fields for a specified file's version.
My problem is that we could have (and we'll) empty values, ie. missing FileVersion_Field relations. I'll try to give you an example bellow :
FileVersion Field
+----------------+---------+---------+ +----------+-------+---------------+
| id_fileversion | id_file | version | | id_field | value | id_fieldmodel |
+----------------+---------+---------+ +----------+-------+---------------+
| 1 | 1 | 1 | | 1 | Smith | 1 |
| 2 | 1 | 2 | | 2 | 20 | 2 |
+----------------+---------+---------+ | 3 | 25 | 2 |
+----------+-------+---------------+
FileVersion_Field FieldModel
+----------------+----------+ +---------------+------+
| id_fileversion | id_field | | id_fieldmodel | type |
+----------------+----------+ +---------------+------+
| 1 | 1 | | 1 | Name |
| 1 | 2 | | 2 | Age |
| 2 | 3 | +---------------+------+
+----------------+----------+
In this example, I would like to get these results:
-- id_file=1 & version=1
Name | Smith
Age | 20
-- id_file=1 & version=2
Name |
Age | 25
Here's what I've tried, which doesn't work :
SELECT DISTINCT ON(FieldModel.id_fieldmodel) *
FROM File
LEFT JOIN FileVersion ON File.id_file = FileVersion.id_file
LEFT JOIN FileVersion_Field ON FileVersion.id_fileversion = FileVersion_Field.id_fileversion
LEFT JOIN Field ON FileVersion_Field.id_field = Field.id_field
RIGHT JOIN FieldModel ON (Field.id_fieldmodel = FieldModel.id_fieldmodel OR FieldModel.id_fieldmodel IS NULL)
WHERE (FieldModel.id_fieldmodel IS NOT NULL AND FileVersion.version = 2 AND File.id_file = 1)
OR (Field.id_fieldmodel IS NULL)
ORDER BY FieldModel.id_fieldmodel;
-- Sample Structure
CREATE TABLE File (
id_file integer PRIMARY KEY);
CREATE TABLE FieldModel (
id_fieldmodel integer PRIMARY KEY, type varchar(50));
CREATE TABLE FileVersion (
id_fileversion integer PRIMARY KEY,
id_file integer, version integer,
CONSTRAINT fk_fileversion_file FOREIGN KEY(id_file) REFERENCES File(id_file));
CREATE TABLE Field (
id_field integer PRIMARY KEY,
id_fieldmodel integer,
value varchar(255),
CONSTRAINT fk_field_fieldmodel FOREIGN KEY(id_fieldmodel) REFERENCES FieldModel(id_fieldmodel));
CREATE TABLE FileVersion_Field (
id_fileversion integer,
id_field integer,
PRIMARY KEY(id_fileversion, id_field),
CONSTRAINT fk_fileversionfield_fileversion FOREIGN KEY(id_fileversion) REFERENCES FileVersion(id_fileversion),
CONSTRAINT fk_fileversionfield_field FOREIGN KEY(id_field) REFERENCES Field(id_field));
-- Sample Data
INSERT INTO File (id_file) VALUES (1);
INSERT INTO FileVersion (id_fileversion, id_file, version) VALUES (1, 1, 1), (2, 1, 2);
INSERT INTO FieldModel (id_fieldmodel, type) VALUES (1, 'Name'), (2, 'Age');
INSERT INTO Field (id_field, id_fieldmodel, value) VALUES (1, 1, 'Smith'), (2, 2, '20'), (3, 2, '25');
INSERT INTO FileVersion_Field (id_fileversion, id_field) VALUES (1, 1), (1, 2), (2, 3);

7 years later, time to exorcize my daemons!
I just needed to change my way of thinking.
First, we need the list of all used FieldModel for a File, whatever the version:
SELECT DISTINCT(fm.id_fieldmodel), fm.type
FROM FieldModel fm
LEFT JOIN Field f ON fm.id_fieldmodel = f.id_fieldmodel
LEFT JOIN FileVersion_Field fvf ON f.id_field = fvf.id_field
LEFT JOIN FileVersion fv ON fv.id_fileversion = fvf.id_fileversion
WHERE fv.id_file = 1;
-- id_fieldmodel | type
-- ---------------+------
-- 1 | Name
-- 2 | Age
Now, we need the list of Field for the same File, but this time with a specified version:
SELECT f.id_fieldmodel, f.value
FROM FileVersion_Field fvv
JOIN FileVersion fv ON fv.id_fileversion = fvv.id_fileversion
JOIN Field f ON f.id_field = fvv.id_field
WHERE fv.id_file = 1 AND fv.version = 2;
-- id_fieldmodel | value
-- ---------------+-------
-- 2 | 25
All that remains is to use a LEFT JOIN on both computed tables, by allowing NULL values in the fields:
SELECT fm.type, f.value
FROM (
SELECT DISTINCT(fm.id_fieldmodel), fm.type
FROM FieldModel fm
LEFT JOIN Field f ON fm.id_fieldmodel = f.id_fieldmodel
LEFT JOIN FileVersion_Field fvf ON f.id_field = fvf.id_field
LEFT JOIN FileVersion fv ON fv.id_fileversion = fvf.id_fileversion
WHERE fv.id_file = 1
) fm
LEFT JOIN (
SELECT f.id_fieldmodel, f.value
FROM FileVersion_Field fvv
JOIN FileVersion fv ON fv.id_fileversion = fvv.id_fileversion
JOIN Field f ON f.id_field = fvv.id_field
WHERE fv.id_file = 1 AND fv.version = 2
) f ON (f.id_fieldmodel = fm.id_fieldmodel OR f.id_fieldmodel IS NULL);
-- type | value
-- ------+-------
-- Name |
-- Age | 25

Related

How can I choose which column do I refer to?

I have 2 tables with some duplicate columns. I need to join them without picking which columns I want to select:
CREATE TABLE IF NOT EXISTS animals (
id int(6) unsigned NOT NULL,
cond varchar(200) NOT NULL,
animal varchar(200) NOT NULL,
PRIMARY KEY (id)
) DEFAULT CHARSET=utf8;
INSERT INTO animals (id, cond, animal) VALUES
('1', 'fat', 'cat'),
('2', 'slim', 'cat'),
('3', 'fat', 'dog'),
('4', 'slim', 'dog'),
('5', 'normal', 'dog');
CREATE TABLE IF NOT EXISTS names (
id int(6) unsigned NOT NULL,
name varchar(200) NOT NULL,
animal varchar(200) NOT NULL,
PRIMARY KEY (id)
) DEFAULT CHARSET=utf8;
INSERT INTO names (id, name, animal) VALUES
('1', 'LuLu', 'cat'),
('2', 'DoDo', 'cat'),
('3', 'Jack', 'dog'),
('4', 'Shorty', 'dog'),
('5', 'Stinky', 'dog');
SELECT *
FROM animals AS a
JOIN names as n
ON a.id = n.id;
Result:
| id | cond | animal | id | name | animal |
| --- | ------ | ------ | --- | ------ | ------ |
| 1 | fat | cat | 1 | LuLu | cat |
| 2 | slim | cat | 2 | DoDo | cat |
| 3 | fat | dog | 3 | Jack | dog |
| 4 | slim | dog | 4 | Shorty | dog |
| 5 | normal | dog | 5 | Stinky | dog |
But when I try to make another request from the resulting table like:
SELECT name
FROM
(
SELECT *
FROM animals AS a
JOIN names as n
ON a.id = n.id
) as res_tbl
WHERE name = 'LuLu';
I get:
Query Error: Error: ER_DUP_FIELDNAME: Duplicate column name 'id'
Is there any way of avoiding it except removing duplicate columns from the 1st request?
P.S. in fact I am using PostgreSQL, I create my schema as MySQL because I am more used to it
You have columns with the same name in both tables, which causes ambiguity.
If you just want the name column in the outer query, then select that column only in the subquery:
select name
from (
select n.name
from animals a
inner join names n using (id)
) t
where ...
If you want more columns, then you would typically alias the homonym columns to remove the ambiguity - as for the joining column (here, id), the using() syntax is sufficient. So, for example:
select ...
from (
select id, a.cond, a.animal as animal1, n.name, n.animal as animal2
from animals a
inner join names n using (id)
) t
where ...
You may also select the records themselves, instead of the columns from them, which you can then access in an outer query using the usual record.column syntax;
SELECT a.cond animal_cond,
n.name animal_name
FROM (
SELECT a, n
FROM animals AS a
JOIN names as n
ON a.id = n.id
) t

Get records having the same value in 2 columns but a different value in a 3rd column

I am having trouble writing a query that will return all records where 2 columns have the same value but a different value in a 3rd column. I am looking for the records where the Item_Type and Location_ID are the same, but the Sub_Location_ID is different.
The table looks like this:
+---------+-----------+-------------+-----------------+
| Item_ID | Item_Type | Location_ID | Sub_Location_ID |
+---------+-----------+-------------+-----------------+
| 1 | 00001 | 20 | 78 |
| 2 | 00001 | 110 | 124 |
| 3 | 00001 | 110 | 124 |
| 4 | 00002 | 3 | 18 |
| 5 | 00002 | 3 | 25 |
+---------+-----------+-------------+-----------------+
The result I am trying to get would look like this:
+---------+-----------+-------------+-----------------+
| Item_ID | Item_Type | Location_ID | Sub_Location_ID |
+---------+-----------+-------------+-----------------+
| 4 | 00002 | 3 | 18 |
| 5 | 00002 | 3 | 25 |
+---------+-----------+-------------+-----------------+
I have been trying to use the following query:
SELECT *
FROM Table1
WHERE Item_Type IN (
SELECT Item_Type
FROM Table1
GROUP BY Item_Type
HAVING COUNT (DISTINCT Sub_Location_ID) > 1
)
But it returns all records with the same Item_Type and a different Sub_Location_ID, not all records with the same Item_Type AND Location_ID but a different Sub_Location_ID.
This should do the trick...
-- some test data...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
BEGIN DROP TABLE #TestData; END;
CREATE TABLE #TestData (
Item_ID INT NOT NULL PRIMARY KEY,
Item_Type CHAR(5) NOT NULL,
Location_ID INT NOT NULL,
Sub_Location_ID INT NOT NULL
);
INSERT #TestData (Item_ID, Item_Type, Location_ID, Sub_Location_ID) VALUES
(1, '00001', 20, 78),
(2, '00001', 110, 124),
(3, '00001', 110, 124),
(4, '00002', 3, 18),
(5, '00002', 3, 25);
-- adding a covering index will eliminate the sort operation...
CREATE NONCLUSTERED INDEX ix_indexname ON #TestData (Item_Type, Location_ID, Sub_Location_ID, Item_ID);
-- the actual solution...
WITH
cte_count_group AS (
SELECT
td.Item_ID,
td.Item_Type,
td.Location_ID,
td.Sub_Location_ID,
cnt_grp_2 = COUNT(1) OVER (PARTITION BY td.Item_Type, td.Location_ID),
cnt_grp_3 = COUNT(1) OVER (PARTITION BY td.Item_Type, td.Location_ID, td.Sub_Location_ID)
FROM
#TestData td
)
SELECT
cg.Item_ID,
cg.Item_Type,
cg.Location_ID,
cg.Sub_Location_ID
FROM
cte_count_group cg
WHERE
cg.cnt_grp_2 > 1
AND cg.cnt_grp_3 < cg.cnt_grp_2;
You can use exists :
select t.*
from table t
where exists (select 1
from table t1
where t.Item_Type = t1.Item_Type and
t.Location_ID = t1.Location_ID and
t.Sub_Location_ID <> t1.Sub_Location_ID
);
Sql server has no vector IN so you can emulate it with a little trick. Assuming '#' is illegal char for Item_Type
SELECT *
FROM Table1
WHERE Item_Type+'#'+Cast(Location_ID as varchar(20)) IN (
SELECT Item_Type+'#'+Cast(Location_ID as varchar(20))
FROM Table1
GROUP BY Item_Type, Location_ID
HAVING COUNT (DISTINCT Sub_Location_ID) > 1
);
The downsize is the expression in WHERE is non-sargable
I think you can use exists:
select t1.*
from table1 t1
where exists (select 1
from table1 tt1
where tt1.Item_Type = t1.Item_Type and
tt1.Location_ID = t1.Location_ID and
tt1.Sub_Location_ID <> t1.Sub_Location_ID
);

How to copy rows into a new a one to many relationship

I'm trying to copy a set of data in a one to many relationship to create a new set of the same data in a new, but unrelated one to many relationship. Lets call them groups and items. Groups have a 1-* relation with items - one group has many items.
I've tried to create a CTE to do this, however I can't get the items inserted (in y) as the newly inserted groups don't have any items associated with them yet. I think I need to be able to access old. and new. like you would in a trigger, but I can't work out how to do this.
I think I could solve this by introducing a previous parent id into the templateitem table, or maybe a temp table with the data required to enable me to join on that, but I was wondering if it is possible to solve it this way?
SQL Fiddle Keeps Breaking on me, so I've put the code here as well:
DROP TABLE IF EXISTS meta.templateitem;
DROP TABLE IF EXISTS meta.templategroup;
CREATE TABLE meta.templategroup (
templategroup_id serial PRIMARY KEY,
groupname text,
roworder int
);
CREATE TABLE meta.templateitem (
templateitem_id serial PRIMARY KEY,
itemname text,
templategroup_id INTEGER NOT NULL REFERENCES meta.templategroup(templategroup_id)
);
INSERT INTO meta.templategroup (groupname, roworder) values ('Group1', 1), ('Group2', 2);
INSERT INTO meta.templateitem (itemname, templategroup_id) values ('Item1A',1), ('Item1B',1), ('Item2A',2);
WITH
x AS (
INSERT INTO meta.templategroup (groupname, roworder)
SELECT distinct groupname || '_v1' FROM meta.templategroup where templategroup_id in (1,2)
RETURNING groupname, templategroup_id, roworder
),
y AS (
Insert INTO meta.templateitem (itemname, templategroup_id)
Select itemname, x.templategroup_id
From meta.templateitem i
INNER JOIN x on x.templategroup_id = i.templategroup_id
RETURNING *
)
SELECT * FROM y;
Use an auxiliary column templategroup.old_id:
ALTER TABLE meta.templategroup ADD old_id int;
WITH x AS (
INSERT INTO meta.templategroup (groupname, roworder, old_id)
SELECT DISTINCT groupname || '_v1', roworder, templategroup_id
FROM meta.templategroup
WHERE templategroup_id IN (1,2)
RETURNING templategroup_id, old_id
),
y AS (
INSERT INTO meta.templateitem (itemname, templategroup_id)
SELECT itemname, x.templategroup_id
FROM meta.templateitem i
INNER JOIN x ON x.old_id = i.templategroup_id
RETURNING *
)
SELECT * FROM y;
templateitem_id | itemname | templategroup_id
-----------------+----------+------------------
4 | Item1A | 3
5 | Item1B | 3
6 | Item2A | 4
(3 rows)
It's impossible to do that in a single plain sql query without an additional column. You have to store the old ids somewhere. As an alternative you can use plpgsql and anonymous code block:
Before:
select *
from meta.templategroup
join meta.templateitem using (templategroup_id);
templategroup_id | groupname | roworder | templateitem_id | itemname
------------------+-----------+----------+-----------------+----------
1 | Group1 | 1 | 1 | Item1A
1 | Group1 | 1 | 2 | Item1B
2 | Group2 | 2 | 3 | Item2A
(3 rows)
Insert:
do $$
declare
grp record;
begin
for grp in
select distinct groupname || '_v1' groupname, roworder, templategroup_id
from meta.templategroup
where templategroup_id in (1,2)
loop
with insert_group as (
insert into meta.templategroup (groupname, roworder)
values (grp.groupname, grp.roworder)
returning templategroup_id
)
insert into meta.templateitem (itemname, templategroup_id)
select itemname || '_v1', g.templategroup_id
from meta.templateitem i
join insert_group g on grp.templategroup_id = i.templategroup_id;
end loop;
end $$;
After:
select *
from meta.templategroup
join meta.templateitem using (templategroup_id);
templategroup_id | groupname | roworder | templateitem_id | itemname
------------------+-----------+----------+-----------------+-----------
1 | Group1 | 1 | 1 | Item1A
1 | Group1 | 1 | 2 | Item1B
2 | Group2 | 2 | 3 | Item2A
3 | Group1_v1 | 1 | 4 | Item1A_v1
3 | Group1_v1 | 1 | 5 | Item1B_v1
4 | Group2_v1 | 2 | 6 | Item2A_v1
(6 rows)

Fast query to do normalization on SQL data

I have some data that I want to normalize. Specifically I'm normalizing it so I can process the portions getting normalized without having to worry about duplicates. What I'm doing is:
INSERT INTO new_table (a, b, c)
SELECT DISTINCT a,b,c
FROM old_table;
UPDATE old_table
SET abc_id = new_table.id
FROM new_table
WHERE new_table.a = old_table.a
AND new_table.b = old_table.b
AND new_table.c = old_table.c;
First off, it seems as if there should be a better way of doing this. It seems that the inherent process of finding the distinct data could produce a list of the members that belong to it. Second, and more important, the INSERT takes a couple and the UPDATE takes FOREVER (I don't actually have a value for how long it takes yet because it's still running). I'm using postgresql. Is there a better way of doing this (perhaps all in one query).
This is my other answer, extended to three columns:
-- Some test data
CREATE TABLE the_table
( id SERIAL NOT NULL PRIMARY KEY
, name varchar
, a INTEGER
, b varchar
, c varchar
);
INSERT INTO the_table(name, a,b,c) VALUES
( 'Chimpanzee' , 1, 'mammals', 'apes' )
,( 'Urang Utang' , 1, 'mammals', 'apes' )
,( 'Homo Sapiens' , 1, 'mammals', 'apes' )
,( 'Mouse' , 2, 'mammals', 'rodents' )
,( 'Rat' , 2, 'mammals', 'rodents' )
,( 'Cat' , 3, 'mammals', 'felix' )
,( 'Dog' , 3, 'mammals', 'canae' )
;
-- [empty] table to contain the "squeezed out" domain {a,b,c}
CREATE TABLE abc_table
( id SERIAL NOT NULL PRIMARY KEY
, a INTEGER
, b varchar
, c varchar
, UNIQUE (a,b,c)
);
-- The original table needs a "link" to the new table
ALTER TABLE the_table
ADD column abc_id INTEGER -- NOT NULL
REFERENCES abc_table(id)
;
-- FK constraints are helped a lot by a supportive index.
CREATE INDEX abc_table_fk ON the_table (abc_id);
-- Chained query to:
-- * populate the domain table
-- * initialize the FK column in the original table
WITH ins AS (
INSERT INTO abc_table(a,b,c)
SELECT DISTINCT a,b,c
FROM the_table a
RETURNING *
)
UPDATE the_table ani
SET abc_id = ins.id
FROM ins
WHERE ins.a = ani.a
AND ins.b = ani.b
AND ins.c = ani.c
;
-- Now that we have the FK pointing to the new table,
-- we can drop the redundant columns.
ALTER TABLE the_table DROP COLUMN a, DROP COLUMN b, DROP COLUMN c;
SELECT * FROM the_table;
SELECT * FROM abc_table;
-- show it to the world
SELECT a.*
, c.a, c.b, c.c
FROM the_table a
JOIN abc_table c ON c.id = a.abc_id
;
Results:
CREATE TABLE
INSERT 0 7
CREATE TABLE
ALTER TABLE
CREATE INDEX
UPDATE 7
ALTER TABLE
id | name | abc_id
----+--------------+--------
1 | Chimpanzee | 4
2 | Urang Utang | 4
3 | Homo Sapiens | 4
4 | Mouse | 3
5 | Rat | 3
6 | Cat | 1
7 | Dog | 2
(7 rows)
id | a | b | c
----+---+---------+---------
1 | 3 | mammals | felix
2 | 3 | mammals | canae
3 | 2 | mammals | rodents
4 | 1 | mammals | apes
(4 rows)
id | name | abc_id | a | b | c
----+--------------+--------+---+---------+---------
1 | Chimpanzee | 4 | 1 | mammals | apes
2 | Urang Utang | 4 | 1 | mammals | apes
3 | Homo Sapiens | 4 | 1 | mammals | apes
4 | Mouse | 3 | 2 | mammals | rodents
5 | Rat | 3 | 2 | mammals | rodents
6 | Cat | 1 | 3 | mammals | felix
7 | Dog | 2 | 3 | mammals | canae
(7 rows)
Came up with a way to do this on my own:
BEGIN;
CREATE TEMPORARY TABLE new_table_temp (
LIKE new_table,
old_ids integer[]
)
ON COMMIT DROP;
INSERT INTO new_table_temp (a, b, c, old_ids)
SELECT a, b, c, array_ag(id) AS old_ids
FROM old_table
GROUP BY a, b, c;
INSERT INTO new_table (id, a, b, c)
SELECT id, a, b, c
FROM new_table_temp;
UPDATE old_table
SET abc_id = new_table_temp.id
FROM new_table_temp
WHERE old_table.id = ANY(new_table_temp.old_ids);
COMMIT;
This at least is what I was looking for. I'll update this as to whether it worked quickly. The EXPLAIN seems to form a sensible plan, so I'm hopeful.

How to the write SQL to show the data in my case in Oracle?

I have a table like this -
create table tbl1
(
id number,
role number
);
insert into tbl1 values (1, 1);
insert into tbl1 values (2, 3);
insert into tbl1 values (1, 3);
create table tbl2
(
role number,
meaning varchar(50)
);
insert into tbl2 values (1, 'changing data');
insert into tbl2 values (2, 'move file');
insert into tbl2 values (3, 'dance');
I want the sql result like the following -
id role_meaning is_permitted
1 changing data yes
1 move file no
1 dance yes
2 changing data no
2 move file no
2 dance yes
Please help how can I do this? I have tried several methods but not sure how to do this.
You can use partitioned outer join here.
SQL Fiddle
Query 1:
select tbl1.id,
tbl2.meaning,
case when tbl1.role is NULL then 'no' else 'yes' end is_permitted
from tbl1
partition by (id) right outer join tbl2
on tbl1.role = tbl2.role
order by tbl1.id, tbl2.role
Results:
| ID | MEANING | IS_PERMITTED |
|----|---------------|--------------|
| 1 | changing data | yes |
| 1 | move file | no |
| 1 | dance | yes |
| 2 | changing data | no |
| 2 | move file | no |
| 2 | dance | yes |