How to copy rows into a new a one to many relationship - sql

I'm trying to copy a set of data in a one to many relationship to create a new set of the same data in a new, but unrelated one to many relationship. Lets call them groups and items. Groups have a 1-* relation with items - one group has many items.
I've tried to create a CTE to do this, however I can't get the items inserted (in y) as the newly inserted groups don't have any items associated with them yet. I think I need to be able to access old. and new. like you would in a trigger, but I can't work out how to do this.
I think I could solve this by introducing a previous parent id into the templateitem table, or maybe a temp table with the data required to enable me to join on that, but I was wondering if it is possible to solve it this way?
SQL Fiddle Keeps Breaking on me, so I've put the code here as well:
DROP TABLE IF EXISTS meta.templateitem;
DROP TABLE IF EXISTS meta.templategroup;
CREATE TABLE meta.templategroup (
templategroup_id serial PRIMARY KEY,
groupname text,
roworder int
);
CREATE TABLE meta.templateitem (
templateitem_id serial PRIMARY KEY,
itemname text,
templategroup_id INTEGER NOT NULL REFERENCES meta.templategroup(templategroup_id)
);
INSERT INTO meta.templategroup (groupname, roworder) values ('Group1', 1), ('Group2', 2);
INSERT INTO meta.templateitem (itemname, templategroup_id) values ('Item1A',1), ('Item1B',1), ('Item2A',2);
WITH
x AS (
INSERT INTO meta.templategroup (groupname, roworder)
SELECT distinct groupname || '_v1' FROM meta.templategroup where templategroup_id in (1,2)
RETURNING groupname, templategroup_id, roworder
),
y AS (
Insert INTO meta.templateitem (itemname, templategroup_id)
Select itemname, x.templategroup_id
From meta.templateitem i
INNER JOIN x on x.templategroup_id = i.templategroup_id
RETURNING *
)
SELECT * FROM y;

Use an auxiliary column templategroup.old_id:
ALTER TABLE meta.templategroup ADD old_id int;
WITH x AS (
INSERT INTO meta.templategroup (groupname, roworder, old_id)
SELECT DISTINCT groupname || '_v1', roworder, templategroup_id
FROM meta.templategroup
WHERE templategroup_id IN (1,2)
RETURNING templategroup_id, old_id
),
y AS (
INSERT INTO meta.templateitem (itemname, templategroup_id)
SELECT itemname, x.templategroup_id
FROM meta.templateitem i
INNER JOIN x ON x.old_id = i.templategroup_id
RETURNING *
)
SELECT * FROM y;
templateitem_id | itemname | templategroup_id
-----------------+----------+------------------
4 | Item1A | 3
5 | Item1B | 3
6 | Item2A | 4
(3 rows)
It's impossible to do that in a single plain sql query without an additional column. You have to store the old ids somewhere. As an alternative you can use plpgsql and anonymous code block:
Before:
select *
from meta.templategroup
join meta.templateitem using (templategroup_id);
templategroup_id | groupname | roworder | templateitem_id | itemname
------------------+-----------+----------+-----------------+----------
1 | Group1 | 1 | 1 | Item1A
1 | Group1 | 1 | 2 | Item1B
2 | Group2 | 2 | 3 | Item2A
(3 rows)
Insert:
do $$
declare
grp record;
begin
for grp in
select distinct groupname || '_v1' groupname, roworder, templategroup_id
from meta.templategroup
where templategroup_id in (1,2)
loop
with insert_group as (
insert into meta.templategroup (groupname, roworder)
values (grp.groupname, grp.roworder)
returning templategroup_id
)
insert into meta.templateitem (itemname, templategroup_id)
select itemname || '_v1', g.templategroup_id
from meta.templateitem i
join insert_group g on grp.templategroup_id = i.templategroup_id;
end loop;
end $$;
After:
select *
from meta.templategroup
join meta.templateitem using (templategroup_id);
templategroup_id | groupname | roworder | templateitem_id | itemname
------------------+-----------+----------+-----------------+-----------
1 | Group1 | 1 | 1 | Item1A
1 | Group1 | 1 | 2 | Item1B
2 | Group2 | 2 | 3 | Item2A
3 | Group1_v1 | 1 | 4 | Item1A_v1
3 | Group1_v1 | 1 | 5 | Item1B_v1
4 | Group2_v1 | 2 | 6 | Item2A_v1
(6 rows)

Related

Get records having the same value in 2 columns but a different value in a 3rd column

I am having trouble writing a query that will return all records where 2 columns have the same value but a different value in a 3rd column. I am looking for the records where the Item_Type and Location_ID are the same, but the Sub_Location_ID is different.
The table looks like this:
+---------+-----------+-------------+-----------------+
| Item_ID | Item_Type | Location_ID | Sub_Location_ID |
+---------+-----------+-------------+-----------------+
| 1 | 00001 | 20 | 78 |
| 2 | 00001 | 110 | 124 |
| 3 | 00001 | 110 | 124 |
| 4 | 00002 | 3 | 18 |
| 5 | 00002 | 3 | 25 |
+---------+-----------+-------------+-----------------+
The result I am trying to get would look like this:
+---------+-----------+-------------+-----------------+
| Item_ID | Item_Type | Location_ID | Sub_Location_ID |
+---------+-----------+-------------+-----------------+
| 4 | 00002 | 3 | 18 |
| 5 | 00002 | 3 | 25 |
+---------+-----------+-------------+-----------------+
I have been trying to use the following query:
SELECT *
FROM Table1
WHERE Item_Type IN (
SELECT Item_Type
FROM Table1
GROUP BY Item_Type
HAVING COUNT (DISTINCT Sub_Location_ID) > 1
)
But it returns all records with the same Item_Type and a different Sub_Location_ID, not all records with the same Item_Type AND Location_ID but a different Sub_Location_ID.
This should do the trick...
-- some test data...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
BEGIN DROP TABLE #TestData; END;
CREATE TABLE #TestData (
Item_ID INT NOT NULL PRIMARY KEY,
Item_Type CHAR(5) NOT NULL,
Location_ID INT NOT NULL,
Sub_Location_ID INT NOT NULL
);
INSERT #TestData (Item_ID, Item_Type, Location_ID, Sub_Location_ID) VALUES
(1, '00001', 20, 78),
(2, '00001', 110, 124),
(3, '00001', 110, 124),
(4, '00002', 3, 18),
(5, '00002', 3, 25);
-- adding a covering index will eliminate the sort operation...
CREATE NONCLUSTERED INDEX ix_indexname ON #TestData (Item_Type, Location_ID, Sub_Location_ID, Item_ID);
-- the actual solution...
WITH
cte_count_group AS (
SELECT
td.Item_ID,
td.Item_Type,
td.Location_ID,
td.Sub_Location_ID,
cnt_grp_2 = COUNT(1) OVER (PARTITION BY td.Item_Type, td.Location_ID),
cnt_grp_3 = COUNT(1) OVER (PARTITION BY td.Item_Type, td.Location_ID, td.Sub_Location_ID)
FROM
#TestData td
)
SELECT
cg.Item_ID,
cg.Item_Type,
cg.Location_ID,
cg.Sub_Location_ID
FROM
cte_count_group cg
WHERE
cg.cnt_grp_2 > 1
AND cg.cnt_grp_3 < cg.cnt_grp_2;
You can use exists :
select t.*
from table t
where exists (select 1
from table t1
where t.Item_Type = t1.Item_Type and
t.Location_ID = t1.Location_ID and
t.Sub_Location_ID <> t1.Sub_Location_ID
);
Sql server has no vector IN so you can emulate it with a little trick. Assuming '#' is illegal char for Item_Type
SELECT *
FROM Table1
WHERE Item_Type+'#'+Cast(Location_ID as varchar(20)) IN (
SELECT Item_Type+'#'+Cast(Location_ID as varchar(20))
FROM Table1
GROUP BY Item_Type, Location_ID
HAVING COUNT (DISTINCT Sub_Location_ID) > 1
);
The downsize is the expression in WHERE is non-sargable
I think you can use exists:
select t1.*
from table1 t1
where exists (select 1
from table1 tt1
where tt1.Item_Type = t1.Item_Type and
tt1.Location_ID = t1.Location_ID and
tt1.Sub_Location_ID <> t1.Sub_Location_ID
);

Generating random data in two tables with 1:M relationship with more cells of unique data in the second table in PostgreSQL

I have two tables: first_table and other_table. I'm generating 3 rows of random data in the pr_key column in the first_table. The first_table and other_table have a 1:M relationship.
For each pr_key in the other_table I need to generate again random numbers in the second_code column so that there would be in total 9 rows.
The problem with my approach is, that the random numbers in the second_code column repeat for each pr_key, but they need to be different.
Additionally, in the other_table there is a constraint, that checks, that the pairs of pr_key and second_code are unique.
with oper as (
INSERT INTO first_table(
pr_key,
)
SELECT
pr_key,
FROM (
SELECT(
SELECT (random()*10)::int+(gen*0) as pr_key
),
gen as id
FROM generate_series(1,3) as gen
) main
RETURNING pr_key)
INSERT INTO other_table(pr_key, second_code)
SELECT pr_key, second_code
FROM oper, (
SELECT
(
SELECT 1+(random()*10)::int+(gen*0) as second_code
),
gen as id
FROM generate_series(1,3) as gen
) AS gener
Try with this syntax:
CREATE TABLE t1 (pr_key int);
CREATE TABLE t2 (pr_key int, second_code int);
with c1 as
(
insert into t1
select pr_key
from (
select (select (random()*10)::int+(gen*0) as pr_key),
gen as id
from generate_series(1,3) as gen
) t
returning pr_key
)
insert into t2 (pr_key, second_code)
select pr_key, (select (random()*10)::int+(id*0))
from c1, (select gen as id from generate_series(1,3) as gen) t2
select * from t1;
| pr_key |
| -----: |
| 2 |
| 7 |
| 9 |
select * from t2;
pr_key | second_code
-----: | ----------:
2 | 5
2 | 7
2 | 1
7 | 4
7 | 0
7 | 3
9 | 4
9 | 10
9 | 2

Update table in Postgresql by grouping rows

I want to update a table by grouping (or combining) some rows together based on a certain criteria. I basically have this table currently (I want to group by 'id_number' and 'date' and sum 'count'):
Table: foo
---------------------------------------
| id_number | date | count |
---------------------------------------
| 1 | 2001 | 1 |
| 1 | 2001 | 2 |
| 1 | 2002 | 1 |
| 2 | 2001 | 6 |
| 2 | 2003 | 12 |
| 2 | 2003 | 2 |
---------------------------------------
And I want to get this:
Table: foo
---------------------------------------
| id_number | date | count |
---------------------------------------
| 1 | 2001 | 3 |
| 1 | 2002 | 1 |
| 2 | 2001 | 6 |
| 2 | 2003 | 14 |
---------------------------------------
I know that I can easily create a new table with the pertinent info. But how can I modify an existing table like this without making a "temp" table? (Note: I have nothing against using a temporary table, I'm just interested in seeing if I can do it this way)
If you want to delete rows you can add a primary key (for distinguish rows) and use two sentences, an UPDATE for the sum and a DELETE for obtain less rows.
You can do something like this:
create table foo (
id integer primary key,
id_number integer,
date integer,
count integer
);
insert into foo values
(1, 1 , 2001 , 1 ),
(2, 1 , 2001 , 2 ),
(3, 1 , 2002 , 1 ),
(4, 2 , 2001 , 6 ),
(5, 2 , 2003 , 12 ),
(6, 2 , 2003 , 2 );
select * from foo;
update foo
set count = count_sum
from (
select id, id_number, date,
sum(count) over (partition by id_number, date) as count_sum
from foo
) foo_added
where foo.id_number = foo_added.id_number
and foo.date = foo_added.date;
delete from foo
using (
select id, id_number, date,
row_number() over (partition by id_number, date order by id) as inner_order
from foo
) foo_ranked
where foo.id = foo_ranked.id
and foo_ranked.inner_order <> 1;
select * from foo;
You can try it here: http://rextester.com/PIL12447
With only one UPDATE
(but with a trigger) you can set a NULL value in count and trigger a DELETE in that case.
create table foo (
id integer primary key,
id_number integer,
date integer,
count integer
);
create function delete_if_count_is_null() returns trigger
language plpgsql as
$BODY$
begin
if new.count is null then
delete from foo
where id = new.id;
end if;
return new;
end;
$BODY$;
create trigger delete_if_count_is_null
after update on foo
for each row
execute procedure delete_if_count_is_null();
insert into foo values
(1, 1 , 2001 , 1 ),
(2, 1 , 2001 , 2 ),
(3, 1 , 2002 , 1 ),
(4, 2 , 2001 , 6 ),
(5, 2 , 2003 , 12 ),
(6, 2 , 2003 , 2 );
select * from foo;
update foo
set count = case when inner_order = 1 then count_sum else null end
from (
select id, id_number, date,
sum(count) over (partition by id_number, date) as count_sum,
row_number() over (partition by id_number, date order by id) as inner_order
from foo
) foo_added
where foo.id_number = foo_added.id_number
and foo.date = foo_added.date
and foo.id = foo_added.id;
select * from foo;
You can try it in: http://rextester.com/MWPRG10961

Fast query to do normalization on SQL data

I have some data that I want to normalize. Specifically I'm normalizing it so I can process the portions getting normalized without having to worry about duplicates. What I'm doing is:
INSERT INTO new_table (a, b, c)
SELECT DISTINCT a,b,c
FROM old_table;
UPDATE old_table
SET abc_id = new_table.id
FROM new_table
WHERE new_table.a = old_table.a
AND new_table.b = old_table.b
AND new_table.c = old_table.c;
First off, it seems as if there should be a better way of doing this. It seems that the inherent process of finding the distinct data could produce a list of the members that belong to it. Second, and more important, the INSERT takes a couple and the UPDATE takes FOREVER (I don't actually have a value for how long it takes yet because it's still running). I'm using postgresql. Is there a better way of doing this (perhaps all in one query).
This is my other answer, extended to three columns:
-- Some test data
CREATE TABLE the_table
( id SERIAL NOT NULL PRIMARY KEY
, name varchar
, a INTEGER
, b varchar
, c varchar
);
INSERT INTO the_table(name, a,b,c) VALUES
( 'Chimpanzee' , 1, 'mammals', 'apes' )
,( 'Urang Utang' , 1, 'mammals', 'apes' )
,( 'Homo Sapiens' , 1, 'mammals', 'apes' )
,( 'Mouse' , 2, 'mammals', 'rodents' )
,( 'Rat' , 2, 'mammals', 'rodents' )
,( 'Cat' , 3, 'mammals', 'felix' )
,( 'Dog' , 3, 'mammals', 'canae' )
;
-- [empty] table to contain the "squeezed out" domain {a,b,c}
CREATE TABLE abc_table
( id SERIAL NOT NULL PRIMARY KEY
, a INTEGER
, b varchar
, c varchar
, UNIQUE (a,b,c)
);
-- The original table needs a "link" to the new table
ALTER TABLE the_table
ADD column abc_id INTEGER -- NOT NULL
REFERENCES abc_table(id)
;
-- FK constraints are helped a lot by a supportive index.
CREATE INDEX abc_table_fk ON the_table (abc_id);
-- Chained query to:
-- * populate the domain table
-- * initialize the FK column in the original table
WITH ins AS (
INSERT INTO abc_table(a,b,c)
SELECT DISTINCT a,b,c
FROM the_table a
RETURNING *
)
UPDATE the_table ani
SET abc_id = ins.id
FROM ins
WHERE ins.a = ani.a
AND ins.b = ani.b
AND ins.c = ani.c
;
-- Now that we have the FK pointing to the new table,
-- we can drop the redundant columns.
ALTER TABLE the_table DROP COLUMN a, DROP COLUMN b, DROP COLUMN c;
SELECT * FROM the_table;
SELECT * FROM abc_table;
-- show it to the world
SELECT a.*
, c.a, c.b, c.c
FROM the_table a
JOIN abc_table c ON c.id = a.abc_id
;
Results:
CREATE TABLE
INSERT 0 7
CREATE TABLE
ALTER TABLE
CREATE INDEX
UPDATE 7
ALTER TABLE
id | name | abc_id
----+--------------+--------
1 | Chimpanzee | 4
2 | Urang Utang | 4
3 | Homo Sapiens | 4
4 | Mouse | 3
5 | Rat | 3
6 | Cat | 1
7 | Dog | 2
(7 rows)
id | a | b | c
----+---+---------+---------
1 | 3 | mammals | felix
2 | 3 | mammals | canae
3 | 2 | mammals | rodents
4 | 1 | mammals | apes
(4 rows)
id | name | abc_id | a | b | c
----+--------------+--------+---+---------+---------
1 | Chimpanzee | 4 | 1 | mammals | apes
2 | Urang Utang | 4 | 1 | mammals | apes
3 | Homo Sapiens | 4 | 1 | mammals | apes
4 | Mouse | 3 | 2 | mammals | rodents
5 | Rat | 3 | 2 | mammals | rodents
6 | Cat | 1 | 3 | mammals | felix
7 | Dog | 2 | 3 | mammals | canae
(7 rows)
Came up with a way to do this on my own:
BEGIN;
CREATE TEMPORARY TABLE new_table_temp (
LIKE new_table,
old_ids integer[]
)
ON COMMIT DROP;
INSERT INTO new_table_temp (a, b, c, old_ids)
SELECT a, b, c, array_ag(id) AS old_ids
FROM old_table
GROUP BY a, b, c;
INSERT INTO new_table (id, a, b, c)
SELECT id, a, b, c
FROM new_table_temp;
UPDATE old_table
SET abc_id = new_table_temp.id
FROM new_table_temp
WHERE old_table.id = ANY(new_table_temp.old_ids);
COMMIT;
This at least is what I was looking for. I'll update this as to whether it worked quickly. The EXPLAIN seems to form a sensible plan, so I'm hopeful.

Getting the Next Available Row

How can I get a List all the JobPositionNames having the lowest jobPositionId when ContactId = 1
Tablel :
| JobPositionId | JobPositionName | JobDescriptionId | JobCategoryId | ContactId
---------------------------------------------------------------------------------
1 | Audio Cables | 1 | 1 | 1
2 |Audio Connections| 2 | 1 | 1
3 |Audio Connections| 2 | 1 | 0
4 |Audio Connections| 2 | 1 | 0
5 | Sound Board | 3 | 1 | 0
6 | Tent Pen | 4 | 3 | 0
eg the result of this table should be lines 1,3,5,6
I can't figure out the solution.
Only lack of something, but I can give some code for you view.
Maybe it can help you.
--create table
create table t
(
JobPositionId int identity(1,1) primary key,
JobPositionName nvarchar(100) not null,
JobDescriptionId int,
JobCategoryId int,
ContactId int
)
go
--insert values
BEGIN TRAN
INSERT INTO t VALUES ('AudioCables', 1,1,1)
INSERT INTO t VALUES ('AudioConnections',2,1,1)
INSERT INTO t VALUES ('AudioConnections',2,1,0)
INSERT INTO t VALUES ('AudioConnections',2,1,0)
INSERT INTO t VALUES ('SoundBoard',3,1,0)
INSERT INTO t VALUES ('TentPen',4,3,0)
COMMIT TRAN
GO
SELECT
Min(JobPositionId) AS JobPositionId, JobPositionName, ContactId
INTO
#tempTable
FROM
t
GROUP BY JobPositionName, ContactId
SELECT * FROM #tempTable
WHERE JobPositionId IN (
SELECT JobPositionId
FROM #tempTable
GROUP BY JobPositionName
--... lack of sth, I can't figure out ,sorry.
)
drop table t
GO
For per-group maximum/minimum queries you can use a null-self-join as well as strategies like subselects. This is generally faster in MySQL.
SELECT j0.JobPositionId, j0.JobPositionName, j0.ContactId
FROM Jobs AS j0
LEFT JOIN Jobs AS j1 ON j1.JobPositionName=j0.JobPositionName
AND (
(j1.ContactId<>0)<(j0.ContactId<>0)
OR ((j1.ContactId<>0)=(j0.ContactId<>0) AND j1.JobPositionId<j0.JobPositionId))
)
WHERE j1.JobPositionName IS NULL
This says, for each JobPositionName, find a row for which there exists no other row with a lower ordering value. The ordering value here is a composite [ContactId-non-zeroness, JobPositionId].
(Aside: shouldn't JobPositionName and JobCategoryId be normalised out into a table keyed on JobDescriptionId? And shouldn't unassigned ContactIds be NULL?)
SELECT jp.*
FROM (
SELECT JobPositionName, JobPositionId, COUNT(*) AS cnt
FROM JobPosisions
) jpd
JOIN JobPosisions jp
ON jp.JobPositionId =
IF(
cnt = 1,
jpd.JobPositionId,
(
SELECT MIN(JobPositionId)
FROM JobPositions jpi
WHERE jpi.JobPositionName = jpd.JobPositionName
AND jpi.ContactID = 0
)
)
Create an index on (JobPositionName, ContactId, JobPositionId) for this to work fast.
Note that if will not return the jobs having more than one position, neither of which has ContactID = 0