Normalization into three tables in postgres incl one association table - sql

Say I have original data like so:
foo bar baz
1 a b
1 x y
2 z q
And I want to end up with three tables, of which I and III are the main tables and II is an association table between I and III
I. e.:
I
id foo
1 1
2 2
II
id I_id III_id
1 1 1
2 1 2
3 2 3
NB that I_ID is a serial and not foo
III
id bar baz
1 a b
2 x y
3 z q
How would I go about inserting this in one go?
I have played around with CTEs but I am stuck at the following: if I start with III and then return the ID's I cannot see how I can get back to the I table since there is nothing connecting them (yet)
My previous solutions have ended up in pre-generating id sequences which feels so-so

What if you generate a dense rank ?
you generate first a big table with the information you need.
select foo,
bar,
baz,
dense_rank() over (order by foo) as I_id,
dense_rank() over (order by bar, baz) as III_id,
row_number() over (order by 1) as II_id
from main_Table
Then you just have to transfer in the table you want with a distinct.

Start with "main" tables, create two main entities and then use their IDs to insert a record to "connection table" between them, you can use CTE for that of course (I assume that "main" tables I and III have default nextval(..) in PK column, pooling next ID from sequences):
with ins1 as (
insert into tabl1(foo)
values(...)
returning *
), ins3 as (
insert into tabl3(bar, baz)
values (.., ..)
returning *
)
insert into tabl2(i_id, ii_id)
select ins1.id, ins3.id
from ins1, ins3 -- implicit CROSS JOIN here:
-- we assume that only single row was
-- inserted to each "main" table
-- or you need Cartesian product to be inserted to `II`
returning *
;

Related

How to pull up a root element in a table referencing itself with a foreign key ? (loop ?)

For instance lets say that you have a table Person like so :
Id Name Birthdate Parent
1 Hans 1960/10/15 null
2 Svend 1985/01/23 1
3 Peter 2004/03/02 2
Parent is a foreign key on the Person table.
I want to go back all the way to the oldest parent starting from a child.
For example, starting from Peter, is it possible to retrieve Hans in SQL ?
There can be possibly dozens of intermediary rows between the starting row and the ending row.
A Recursive CTE (Recursive Common Table Expression) will do what you want:
with recursive
x as (
select *, 1 as my_level from my_table where id = 3 -- Peter's id
union all
select
t.*, x.my_level + 1
from my_table t
join x on x.parent = t.id
)
select * from x order by my_level desc limit 1

SELECT VALUES in Teradata

I know that it's possible in other SQL flavors (T-SQL) to "select" provided data without a table. Like:
SELECT *
FROM (VALUES (1,2), (3,4)) tbl
How can I do this using Teradata?
Teradata has strange syntax for this:
select t.*
from (select * from (select 1 as a, 2 as b) x
union all
select * from (select 3 as a, 4 as b) x
) t;
I don't have access to a TD system to test, but you might be able to remove one of the nested SELECTs from the answer above:
select x.*
from (
select 1 as a, 2 as b
union all
select 3 as a, 4 as b
) x
If you need to generate some random rows, you can always do a SELECT from a system table, like sys_calendar.calendar:
SELECT 1, 2
FROM sys_calendar.calendar
SAMPLE 10;
Updated example:
SELECT TOP 1000 -- Limit to 1000 rows (you can use SAMPLE too)
ROW_NUMBER() OVER() MyNum, -- Sequential numbering
MyNum MOD 7, -- Modulo operator
RANDOM(1,1000), -- Random number between 1,1000
HASHROW(MyNum) -- Rowhash value of given column(s)
FROM sys_calendar.calendar; -- Use as table to source rows
A couple notes:
make sure you pick a system table that will always be present and have rows
if you need more rows than are available in the source table, do a UNION to get more rows
you can always easily create a one-column table and populate it to whatever number of rows you want by INSERT/SELECT into it:
CREATE DummyTable (c1 INT); -- Create table
INSERT INTO DummyTable(1); -- Seed table
INSERT INTO DummyTable SELECT * FROM DummyTable; -- Run this to duplicate rows as many times are you want
Then use this table to create whatever resultset you want, similar to the query above with sys_calendar.calendar.
I don't have a TD system to test so you might get syntax errors...but that should give you a basic idea.
I am a bit late to this thread, but recently got the same error.
I solved this by simply using
select distinct 1 as a, 2 as b from DBC.tables
union all
select distinct 3 as a, 4 as b from DBC.tables
Here, DBC.tables is a DB backend table with a few rows only. So, the query runs fast as well

Derive groups of records that match over multiple columns, but where some column values might be NULL

I would like an efficient means of deriving groups of matching records across multiple fields. Let's say I have the following table:
CREATE TABLE cust
(
id INT NOT NULL,
class VARCHAR(1) NULL,
cust_type VARCHAR(1) NULL,
terms VARCHAR(1) NULL
);
INSERT INTO cust
VALUES
(1,'A',NULL,'C'),
(2,NULL,'B','C'),
(3,'A','B',NULL),
(4,NULL,NULL,'C'),
(5,'D','E',NULL),
(6,'D',NULL,NULL);
What I am looking to get is the set of IDs for which matching values unify a set of records over the three fields (class, cust_type and terms), so that I can apply a unique ID to the group.
In the example, records 1-4 constitute one match group over the three fields, while records 5-6 form a separate match.
The following does the job:
SELECT
DISTINCT
a.id,
DENSE_RANK() OVER (ORDER BY max(b.class),max(b.cust_type),max(b.terms)) AS match_group
FROM cust AS a
INNER JOIN
cust AS b
ON
a.class = b.class
OR a.cust_type = b.cust_type
OR a.terms = b.terms
GROUP BY a.id
ORDER BY a.id
id match_group
-- -----------
1 1
2 1
3 1
4 1
5 2
6 2
**But, is there a better way?** Running this query on a table of over a million rows is painful...
As Graham pointed out in the comments, the above query doesn't satisfy the requirements if another record is added that would group all the records together.
The following values should be grouped together in one group:
INSERT INTO cust
VALUES
(1,'A',NULL,'C'),
(2,NULL,'B','C'),
(3,'A','B',NULL),
(4,NULL,NULL,'C'),
(5,'D','E',NULL),
(6,'D',NULL,NULL),
(7,'D','B','C');
Would yield:
id match_group
-- -----------
1 1
2 1
3 1
4 1
5 1
6 1
...because the class value of D groups records 5, 6 and 7. The terms value of C matches records 1, 2 and 4 to that group, and cust_type value B ( or class value A) pulls in record 3.
Hopefully that all makes sense.
I don't think you can do this with a (recursive) Select.
I did something similar (trying to identify unique households) using a temporary table & repeated updates using following logic:
For each class|cust_type|terms get the minimum id and update that temp table:
update temp
from
(
SELECT
class, -- similar for cust_type & terms
min(id) as min_id
from temp
group by class
) x
set id = min_id
where temp.class = x.class
and temp.id <> x.min_id
;
Repeat all three updates until none of them updates a row.

Select the row_number or fake identity

I have a table that did not have an identity column added to it. I don't really need one for any specific purpose.
I had 821 rows and I did a test of 500 more. Now I need to check those 500 files and I was looking for a simple way to
select * from table where row_number > 821
I've tried row_number() but I can't have them be ordered, I need all rows above 821 returned.
A table is an unordered bag of rows. You can't identify the first 820 rows that were inserted unless you have some column to identify the order of insertion (a trusted IDENTITY or datetime column, for example). Otherwise you are throwing a bunch of marbles on the floor and asking the first person who walks into the room to identify the first 820 marbles that fell.
Here is a very simple example that demonstrates that output order cannot be predicted, and certainly can't be relied upon to be FIFO (and also shows a case where HABO's "solution" breaks):
CREATE TABLE dbo.foo(id INT, x CHAR(1));
CREATE CLUSTERED INDEX x ON dbo.foo(x);
-- or CREATE INDEX x ON dbo.foo(x, id); -- doesn't require a clustered index to prove
INSERT dbo.foo VALUES(1,'z');
INSERT dbo.foo VALUES(2,'y');
INSERT dbo.foo VALUES(3,'x');
INSERT dbo.foo VALUES(4,'w');
INSERT dbo.foo VALUES(5,'v');
INSERT dbo.foo VALUES(6,'u');
INSERT dbo.foo VALUES(7,'t');
INSERT dbo.foo VALUES(8,'s');
SELECT TOP (5) id, x, ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM dbo.foo;
SELECT * FROM dbo.foo;
Results:
---- ---- ----
8 s 1
7 t 2
6 u 3
5 v 4
4 w 5
---- ----
8 s
7 t
6 u
5 v
4 w
3 x
2 y
1 z
SQLfiddle demo
The following is a hack that may help, but is NOT a generally useful solution:
Row_Number() over (order by (select NULL)) as UntrustworthyRowNumber

Subtract Values from Two Different Tables

Consider table X:
A
-
1
2
3
3
6
Consider table Y:
A
-
0
4
2
1
9
How do you write a query that takes the difference between these two tables, to compute the following table (say table Z):
A
-
1
-2
1
2
-3
It's not clear what you want. Could it be this?
SELECT (SELECT SUM(A) FROM X) -
(SELECT SUM(A) FROM Y)
AS MyValue
Marcelo is 100% right - in a true relational database the order of a result set is never guaranteed. that said, there are some databases that do always return sets in an order.
So if you are willing to risk it, here is one solution. Make two tables with autoincrement keys like this:
CREATE TABLE Sets (
id integer identity(1,1)
, val decimal
)
CREATE TABLE SetY (
id integer identity(1,1)
, val decimal
)
Then fill them with the X and Y values:
INSERT INTO Sets (val) (SELECT * FROM X)
INSERT INTO SetY (val) (SELECT * FROM Y)
Then you can do this to get your answer:
SELECT X.ID, X.Val, Y.Val, X.val-Y.val as Difference
FROM Sets X
LEFT OUTER JOIN SetY Y
ON Y.id = X.ID
I would cross my fingers first though! If there is any way you can get a proper key in your table, please do so.
Cheers,
Daniel