How to insert into separate tables result of aggregate SQL query - sql

I have a table with an index and I am executing a aggregate SQL query using sum
you can see what I am doing here in sqlfiddle.
Create table TX (
i int NOT NULL PRIMARY KEY,
x1 DECIMAL(7,3),
x2 DECIMAL(7,3),
x3 DECIMAL(7,3)
);
INSERT INTO TX (i,x1,x2,x3) values
(1,5, 6,6) ;
INSERT INTO TX (i,x1,x2,x3) values
(2,6, 7, 5);
INSERT INTO TX (i,x1,x2,x3) values
(3,5, 6, 7) ;
INSERT INTO TX (i,x1,x2,x3) values
(4,6, 7, 4);
My question is How can I insert into 3 different tables the results of that query?
SELECT SUM(1),
SUM(x1),SUM(x2),SUM(x3),
SUM(x1*x1),
SUM(x2*x1),SUM(x2*x2),
SUM(x3*x1),SUM(x3*x2),SUM(x3*x3)
FROM TX
so
how can I get something like
Sum(1)
-----
n
index Sums
------------
1 4
2 22
3 26
index1 index2 Mult
----------------------
1 1 122
2 1 144
2 2 170
3 1 119
3 2 141
3 3 126
Instead of
SUM(1) SUM(X1) SUM(X2) SUM(X3) SUM(X1*X1) SUM(X2*X1) SUM(X2*X2) SUM(X3*X1) SUM(X3*X2) SUM(X3*X3)
_____________________________________________________________________________________________________
4 22 26 22 122 144 170 119 141 126

Run 3 separate queries. Turning the SELECTs into INSERTs depends on the RDBMS. For SQL Server, it's just adding an INTO newTableName before the FROM clause to create a new one, or INSERT INTO existingTableName before the SELECT statement.
Create table TX (
i int NOT NULL PRIMARY KEY,
x1 DECIMAL(7,3),
x2 DECIMAL(7,3),
x3 DECIMAL(7,3)
);
INSERT INTO TX (i,x1,x2,x3) values
(1,5, 6,6) ;
INSERT INTO TX (i,x1,x2,x3) values
(2,6, 7, 5);
INSERT INTO TX (i,x1,x2,x3) values
(3,5, 6, 7) ;
INSERT INTO TX (i,x1,x2,x3) values
(4,6, 7, 4);
Query 1:
SELECT COUNT(*) AS SUM1
FROM TX
Results:
| SUM1 |
--------
| 4 |
Query 2:
SELECT SUM(X1) index1, SUM(X2) sums
FROM TX
Results:
| INDEX1 | SUMS |
-----------------
| 22 | 26 |
Query 3:
SELECT x.index1,
x.index2,
case x.id
when 1 then SUM(x1*x1)
when 2 then SUM(x2*x1)
when 3 then SUM(x2*x2)
when 4 then SUM(x3*x1)
when 5 then SUM(x3*x2)
when 6 then SUM(x3*x3)
end Mult
FROM TX
CROSS JOIN
(select 1 id, 1 index1, 1 index2 union all
select 2 id, 2 index1, 1 index2 union all
select 3 id, 3 index1, 1 index2 union all
select 4 id, 2 index1, 2 index2 union all
select 5 id, 3 index1, 2 index2 union all
select 6 id, 3 index1, 3 index2) x
GROUP BY x.id, x.index1, x.index2
ORDER BY x.id
Results:
| INDEX1 | INDEX2 | MULT |
--------------------------
| 1 | 1 | 122 |
| 2 | 1 | 144 |
| 3 | 1 | 170 |
| 2 | 2 | 119 |
| 3 | 2 | 141 |
| 3 | 3 | 126 |

SELECT SUM(1)
FROM TX;
SELECT 1, SUM(x1)
FROM TX
UNION ALL
SELECT 2, SUM(x2)
FROM TX
UNION ALL
SELECT 3, SUM(x3)
FROM TX;
SELECT a.x i1, b.x i2, SUM(a.s * b.s)
FROM
(
SELECT i, 1 x, x1 s
FROM TX
UNION ALL
SELECT i, 2 x, x2 s
FROM TX
UNION ALL
SELECT i, 3 x, x3 s
FROM TX
) a
INNER JOIN
(
SELECT i, 1 x, x1 s
FROM TX
UNION ALL
SELECT i, 2 x, x2 s
FROM TX
UNION ALL
SELECT i, 3 x, x3 s
FROM TX
) b ON a.i = b.i AND a.x >= b.x
GROUP BY a.x, b.x;
SQL Fiddle using your data - Note that your data's sums (second query) do not match those in your question. I trust this is a typo.
Notice I got a bit lazy with the third query. Instead of writing out the expansion I flattened the table first and joined it on itself.
Also note that in the first query SUM(1) can be replaced with COUNT(*).

Related

How to create a single ID across separate entity IDs based on shared attributes using SQL?

Background: I have a SQL table that contains acct_ids and the component_ids, which indicate the component parts used by each account. Accounts can have multiple components, and the same component can be used by multiple accounts.
Objective: I would like to 'dedupe' acct_ids based on shared component_ids; so if any acct_id shares any component_id with any other acct_id, combine those into a single new id. I'm using BigQuery SQL to do this.
Here is a sample of the data table:
+---------+--------------+
| acct_id | component_id |
+---------+--------------+
| 1 | A |
| 1 | B |
| 1 | C |
| 2 | C |
| 2 | D |
| 2 | E |
| 3 | G |
| 3 | E |
| 3 | F |
| 4 | H |
| 4 | I |
| 5 | H |
| 5 | J |
+---------+--------------+
For instance, acct_ids 1 and 2 share component_id C and acct_ids 2 and 3 share component_id E, so all 3 of these acct_ids shoud be labeled with a single, shared id (new_id = 1). Similarly, acct_ids 4 and 5 share component_id H, so both of these acct_ids should be labeled with a single, shared id (new_id = 2).
For the sample data above, the desired output would be:
+---------+--------------+--------+
| acct_id | component_id | new_id |
+---------+--------------+--------+
| 1 | A | 1 |
| 1 | B | 1 |
| 1 | C | 1 |
| 2 | C | 1 |
| 2 | D | 1 |
| 2 | E | 1 |
| 3 | G | 1 |
| 3 | E | 1 |
| 3 | F | 1 |
| 4 | H | 2 |
| 4 | I | 2 |
| 5 | H | 2 |
| 5 | J | 2 |
+---------+--------------+--------+
I've been thinking through ways to tackle this - perhaps an approach incorporating FULL OUTER JOIN is where to start, but I haven't been able to build a cohesive query that gets there yet.
Any suggestions?
You need to use syntax
CASE <field_name> WHEN <value> THEN <new_walue>
I did try it in BigQuery on my data and following works for me:
SELECT
CASE component_id
WHEN 'A' THEN '1'
WHEN 'B' THEN '1'
WHEN 'J' THEN '2'
ELSE '0'
END AS new_component_id, *
FROM `<project>.<dataset>.<table>` LIMIT 1000
You may be also interested in reading documentation for conditional expressions in Standard SQL
Below is for BigQuery Standard SQL
DECLARE rows_count, run_away_stop INT64 DEFAULT 0;
CREATE TEMP TABLE ttt AS
SELECT ARRAY_AGG(component_id ORDER BY component_id) arr
FROM `project.dataset.your_table`
GROUP BY acct_id;
LOOP
SET rows_count = (SELECT COUNT(1) FROM ttt);
SET run_away_stop = run_away_stop + 1;
CREATE OR REPLACE TEMP TABLE ttt AS
SELECT ANY_VALUE(arr) arr FROM (
SELECT ARRAY(SELECT DISTINCT val FROM UNNEST(arr) val ORDER BY val) arr
FROM (
SELECT ANY_VALUE(arr1) arr1, ARRAY_CONCAT_AGG(arr) arr
FROM (
SELECT t1.arr arr1, t2.arr arr2, ARRAY(SELECT DISTINCT val FROM UNNEST(ARRAY_CONCAT( t1.arr, t2.arr)) val ORDER BY val) arr
FROM ttt t1, ttt t2
WHERE (SELECT COUNT(1) FROM UNNEST(t1.arr) val JOIN UNNEST(t2.arr) val USING(val)) > 0
) GROUP BY FORMAT('%t', arr1)
)
) GROUP BY FORMAT('%t', arr);
IF (rows_count = (SELECT COUNT(1) FROM ttt) AND run_away_stop > 1) OR run_away_stop > 10 THEN BREAK; END IF;
END LOOP;
SELECT acct_id, component_id, new_id FROM `project.dataset.your_table`
JOIN (SELECT ROW_NUMBER() OVER() new_id, arr FROM ttt)
ON component_id IN UNNEST(arr);
As you can see - above is using recently introduced scripting feature and if to apply to sample data from your question the final output is
Row acct_id component_id new_id
1 1 A 2
2 1 B 2
3 1 C 2
4 2 C 2
5 2 D 2
6 2 E 2
7 3 E 2
8 3 F 2
9 3 G 2
10 4 H 1
11 4 I 1
12 5 H 1
13 5 J 1
When applying to real data - make sure you set appropriate max for run_away_stop (in above script it is 10 - see last statement within LOOP)
Btw, you can test, play with above using below "playground" which mimics sample data from your question
DECLARE rows_count, run_away_stop INT64 DEFAULT 0;
CREATE TEMP TABLE input AS (
SELECT 1 acct_id, 'A' component_id UNION ALL
SELECT 1, 'B' UNION ALL
SELECT 1, 'C' UNION ALL
SELECT 2, 'C' UNION ALL
SELECT 2, 'D' UNION ALL
SELECT 2, 'E' UNION ALL
SELECT 3, 'G' UNION ALL
SELECT 3, 'E' UNION ALL
SELECT 3, 'F' UNION ALL
SELECT 4, 'H' UNION ALL
SELECT 4, 'I' UNION ALL
SELECT 5, 'H' UNION ALL
SELECT 5, 'J'
);
CREATE TEMP TABLE ttt AS
SELECT ARRAY_AGG(component_id ORDER BY component_id) arr
FROM input
GROUP BY acct_id;
LOOP
SET rows_count = (SELECT COUNT(1) FROM ttt);
SET run_away_stop = run_away_stop + 1;
CREATE OR REPLACE TEMP TABLE ttt AS
SELECT ANY_VALUE(arr) arr FROM (
SELECT ARRAY(SELECT DISTINCT val FROM UNNEST(arr) val ORDER BY val) arr
FROM (
SELECT ANY_VALUE(arr1) arr1, ARRAY_CONCAT_AGG(arr) arr
FROM (
SELECT t1.arr arr1, t2.arr arr2, ARRAY(SELECT DISTINCT val FROM UNNEST(ARRAY_CONCAT( t1.arr, t2.arr)) val ORDER BY val) arr
FROM ttt t1, ttt t2
WHERE (SELECT COUNT(1) FROM UNNEST(t1.arr) val JOIN UNNEST(t2.arr) val USING(val)) > 0
) GROUP BY FORMAT('%t', arr1)
)
) GROUP BY FORMAT('%t', arr);
IF (rows_count = (SELECT COUNT(1) FROM ttt) AND run_away_stop > 1) OR run_away_stop > 10 THEN BREAK; END IF;
END LOOP;
SELECT acct_id, component_id, new_id FROM input
JOIN (SELECT ROW_NUMBER() OVER() new_id, arr FROM ttt)
ON component_id IN UNNEST(arr)
ORDER BY acct_id, component_id;

Select statement that duplicates rows based on N value of column

I have a Power table that stores building circuit details. A circuit can be 1 phase or 3 phase but is always represented as 1 row in the circuit table.
I want to insert the details of the circuits into a join table which joins panels to circuits
My current circuit table has the following details
CircuitID | Voltage | Phase | PanelID | Cct |
1 | 120 | 1 | 1 | 1 |
2 | 208 | 3 | 1 | 3 |
3 | 208 | 2 | 1 | 8 |
Is it possible to create a select where by when it sees a 3 phase row it selects 3 rows (or 2 select 2 rows) and increments the Cct column by 1 each time or do I have to create a loop?
CircuitID | PanelID | Cct |
1 | 1 | 1 |
2 | 1 | 3 |
2 | 1 | 4 |
2 | 1 | 5 |
3 | 1 | 8 |
3 | 1 | 9 |
Here is one way to do it
First generate numbers using tally table(best possible way). Here is one excellent article about generating number without loops. Generate a set or sequence without loops
Then join the numbers table with yourtable where phase value of each record should be greater than sequence number in number's table
;WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), -- 10
e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b), -- 10*10
e3(n) AS (SELECT 1 FROM e1 CROSS JOIN e2), -- 10*100
numbers as ( SELECT n = ROW_NUMBER() OVER (ORDER BY n) FROM e3 )
SELECT CircuitID,
PanelID,
Cct = Cct + ( n - 1 )
FROM Yourtable a
JOIN numbers b
ON a.Phase >= b.n
You can do this with a one recursive cte.
WITH cte AS
(
SELECT [CircuitID], [Voltage], [Phase], [PanelID], [Cct], [Cct] AS [Ref]
FROM [Power]
UNION ALL
SELECT [CircuitID], [Voltage], [Phase], [PanelID], [Cct] + 1, [Ref]
FROM cte
WHERE [Cct] + 1 < [Phase] + [Ref]
)
SELECT [CircuitID], [PanelID], [Cct]
FROM cte
ORDER BY [CircuitID]
Simplest way,
Select y.* from (
Select 1 CircuitID,120 Voltage,1 Phase,1 PanelID, 1 Cct
union
Select 2,208,3,1,3
union
Select 3,208,2,1,8)y,
(Select 1 x
union
Select 2 x
union
Select 3 x)x
Where x.x <= y.Phase
Directly copy paste this and try, it will run 100%. After that, just replace my 'y' table with your real table.

Empty column values but keep values equal to the number provided using SQL Query

Suppose the following table
ID Name RowNumber
2314 YY 1
213 XH 2
421 XD 3
123 AA 4
213 QQQ 5
12 WW 6
312 RR 7
123 GG 8
12 F 9
12 FF 10
312 VV 11
12 BB 12
32 NN 13
43 DD 14
53 DD 15
658 QQQQ 16
768 GGG 17
I want to replace the Name field with empty string based on condition that
First and Last cells value will not be removed.
Need to return values not in continuous cells.
Only n number of cells will be preserved
if n is less than or equal to the number entered by user than do nothing
For example, if user enters 5 then only 5 values will be preserved and the result should be (OR similar)-
ID Name RowNumber
2314 YY 1
213 2
421 3
123 AA 4
213 5
12 6
312 7
123 GG 8
12 9
12 10
312 11
12 12
32 NN 13
43 14
53 15
658 16
768 GGG 17
There could be more records than this.
I'm using SQL Server
The following will work in SQL Server 2012+, because it uses running/cumulative SUM. The query assumes that values in RowNumber column are sequential from 1 to total row count without gaps. If your data is not like this, you can use ROW_NUMBER to generate them.
Calculate Ratio of the given number N and total number of rows (CTE_Ratio)
Calculate running sum of this Ratio, truncating the fractional part of the sum (CTE_Groups)
Each integer value of the running rum defines a group of rows, re-number rows within each group (CTE_Final)
Preserve Name only for the first row from each group
To understand better how it works include intermediate columns (Ratio, GroupNumber, rn) into the output
SQL Fiddle
Sample data
DECLARE #T TABLE ([ID] int, [Name] varchar(50), [RowNumber] int);
INSERT INTO #T([ID], [Name], [RowNumber]) VALUES
(2314, 'YY', 1)
,(213, 'XH', 2)
,(421, 'XD', 3)
,(123, 'AA', 4)
,(213, 'QQQ', 5)
,(12, 'WW', 6)
,(312, 'RR', 7)
,(123, 'GG', 8)
,(12, 'F', 9)
,(12, 'FF', 10)
,(312, 'VV', 11)
,(12, 'BB', 12)
,(32, 'NN', 13)
,(43, 'DD', 14)
,(53, 'DD', 15)
,(658, 'QQQQ', 16)
,(768, 'GGG', 17);
DECLARE #N int = 5;
Query
WITH
CTE_Ratio AS
(
SELECT
ID
,Name
,RowNumber
,COUNT(*) OVER() AS TotalRows
,CAST(#N-1 AS float) / CAST(COUNT(*) OVER() AS float) AS Ratio
FROM #T
)
,CTE_Groups AS
(
SELECT
ID
,Name
,RowNumber
,TotalRows
,ROUND(SUM(Ratio) OVER(ORDER BY RowNumber), 0, 1) AS GroupNumber
FROM CTE_Ratio
)
,CTE_Final AS
(
SELECT
ID
,Name
,RowNumber
,TotalRows
,ROW_NUMBER() OVER(PARTITION BY GroupNumber ORDER BY RowNumber) AS rn
FROM CTE_Groups
)
SELECT
ID
,CASE WHEN rn=1 OR RowNumber = TotalRows THEN Name ELSE '' END AS Name
,RowNumber
FROM CTE_Final
ORDER BY RowNumber;
Result
+------+------+-----------+
| ID | Name | RowNumber |
+------+------+-----------+
| 2314 | YY | 1 |
| 213 | | 2 |
| 421 | | 3 |
| 123 | | 4 |
| 213 | QQQ | 5 |
| 12 | | 6 |
| 312 | | 7 |
| 123 | | 8 |
| 12 | F | 9 |
| 12 | | 10 |
| 312 | | 11 |
| 12 | | 12 |
| 32 | NN | 13 |
| 43 | | 14 |
| 53 | | 15 |
| 658 | | 16 |
| 768 | GGG | 17 |
+------+------+-----------+
Try this:
--Number that user enter
DECLARE #InputNumber INT
DECLARE #WorkingNumber INT
DECLARE #TotalRecords INT
DECLARE #Devider INT
SET #InputNumber = 5
SET #WorkingNumber = #InputNumber -2
--Assume #InputNumber greater than 2 and #TotalRecords greater than 4
SELECT #TotalRecords = COUNT(*)
FROM Table;
SET #Devider = CONVERT(#TotalRecords, DECIMAL(18,2))/CONVERT(#WorkingNumber, DECIMAL(18,2));
WITH Conditioned (RowNumber)
AS
(
SELECT RowNumber
FROM Table
WHERE RowNumber = 1
UNION
SELECT T.RowNumber
FROM (SELECT TOP 1 RowNumber
FROM Conditioned
ORDER BY RowNumber DESC) AS C
INNER JOIN Table AS T ON CONVERT(CEILING(C.RowNumber + #Devider), INT) = T.RowNumber
)
SELECT T.Id, CASE WHEN C.RowNumber IS NULL THEN '' ELSE T.Name END, T.RowNumber
FROM Table T
LEFT OUTER JOIN Conditioned C ON T.RowNumber = C.RowNumber
WHERE
UNION RowNumber != #TotalRecords
SELECT Id, Name, RowNumber
FROM Table
WHERE RowNumber = #TotalRecords

SQL Unpivot multiple columns Data

I am using SQL server 2008 and I am trying to unpivot the data. Here is the SQL code that I am using,
CREATE TABLE #pvt1 (VendorID int, Sa int, Emp1 int,Sa1 int,Emp2 int)
GO
INSERT INTO #pvt1 VALUES (1,2,4,3,9);
GO
--Unpivot the table.
SELECT distinct VendorID,Orders,Orders1
FROM
(SELECT VendorID, Emp1, Sa,Emp2,Sa1
FROM #pvt1 ) p
UNPIVOT
(Orders FOR Emp IN
(Emp1,Emp2)
)AS unpvt
UNPIVOT
(Orders1 FOR Emp1 IN
(Sa,Sa1)
)AS unpvt1;
GO
And Here is the result of the above code.
VendorID Orders Orders1
1 4 2
1 4 3
1 9 2
1 9 3
But I want my Output to be the way indicated below
VendorID Orders Orders1
1 4 2
1 9 3
The relationship from the above code is 2 is related to 4, and 3 is related to 9.
How can I achieve this?
An easier way to unpivot the data would be to use a CROSS APPLY to unpivot the columns in pairs:
select vendorid, orders, orders1
from pvt1
cross apply
(
select emp1, sa union all
select emp2, sa1
) c (orders, orders1);
See SQL Fiddle with Demo. Or you can use CROSS APPLY with the VALUES clause if you don't want to use the UNION ALL:
select vendorid, orders, orders1
from pvt1
cross apply
(
values
(emp1, sa),
(emp2, sa1)
) c (orders, orders1);
See SQL Fiddle with Demo
The answer by Taryn is indeed super useful, and I'd like to expand one aspect of it.
If you have a very un-normalized table like this, with multiple sets of columns for e.g. 4 quarters or 12 months:
+-------+------+------+------+------+------+------+-------+------+
| cYear | foo1 | foo2 | foo3 | foo4 | bar1 | bar2 | bar3 | bar4 |
+-------+------+------+------+------+------+------+-------+------+
| 2020 | 42 | 888 | 0 | 33 | one | two | three | four |
+-------+------+------+------+------+------+------+-------+------+
Then the CROSS APPLY method is easy to write and understand, when you got the hang of it. For the numbered column, use constant values.
SELECT
cYear,
cQuarter,
foo,
bar
FROM temp
CROSS APPLY
(
VALUES
(1, foo1, bar1),
(2, foo2, bar2),
(3, foo3, bar3),
(4, foo4, bar4)
) c (cQuarter, foo, bar)
Result:
+-------+----------+-----+-------+
| cYear | cQuarter | foo | bar |
+-------+----------+-----+-------+
| 2020 | 1 | 42 | one |
| 2020 | 2 | 888 | two |
| 2020 | 3 | 0 | three |
| 2020 | 4 | 33 | four |
+-------+----------+-----+-------+
SQL Fiddle
I needed composit key AND skip extras row in case when data is missing (NULLs). For ex. when x2 and y2 are possible replacement vendor and price
WITH pvt AS (SELECT * FROM (VALUES
( 1, 6, 11, 111, 12, 13, 122, 133),
( 2, 6, 21, 211, 22, 23, 222, 233),
( 3, 6, 31, 311, 32, 33, 322, 333),
( 5, 4, 41, 411, 42, NULL, 422, NULL),
( 6, 4, 51, 511, 52, NULL, 522, NULL))
s( id, s, a, b, x1, x2, y1, y2)
)
-- SELECT * FROM pvt
SELECT CONCAT('xy_',s,'_', id, postfix) as comp_id, a, b, x, y
FROM pvt
CROSS APPLY
(
VALUES
(NULL, x1, y1),
('_ext', x2, y2)
) c (postfix, x, y)
WHERE x IS NOT NULL
produces
comp_id a b x y
-------------------------------- ----------- ----------- ----------- -----------
xy_6_1 11 111 12 122
xy_6_1_ext 11 111 13 133
xy_6_2 21 211 22 222
xy_6_2_ext 21 211 23 233
xy_6_3 31 311 32 322
xy_6_3_ext 31 311 33 333
xy_4_5 41 411 42 422
xy_4_6 51 511 52 522
(8 rows affected)
from:
id s a b x1 x2 y1 y2
----------- ----------- ----------- ----------- ----------- ----------- ----------- -----------
1 6 11 111 12 13 122 133
2 6 21 211 22 23 222 233
3 6 31 311 32 33 322 333
5 4 41 411 42 NULL 422 NULL
6 4 51 511 52 NULL 522 NULL
(5 rows affected)

Recursive SQL statement (Postgresql) - simplified version

This is simplified question for more complicated one posted here:
Recursive SQL statement (PostgreSQL 9.1.4)
Simplified question
Given you have upper triangular matrix stored in 3 columns (RowIndex, ColumnIndex, MatrixValue):
ColumnIndex
1 2 3 4 5
1 2 2 3 3 4
2 4 4 5 6 X
3 3 2 2 X X
4 2 1 X X X
5 1 X X X X
X values are to be calculated using the following algorithm:
M[i,j] = (M[i-1,j]+M[i,j-1])/2
(i= rows, j = columns, M=matrix)
Example:
M[3,4] = (M[2,4]+M[3,3])/2
M[3,5] = (m[2,5]+M[3,4])/2
The full required result is:
ColumnIndex
1 2 3 4 5
1 2 2 3 3 4
2 4 4 5 6 5
3 3 2 2 4 4.5
4 2 1 1.5 2.75 3.625
5 1 1 1.25 2.00 2.8125
Sample data:
create table matrix_data (
RowIndex integer,
ColumnIndex integer,
MatrixValue numeric);
insert into matrix_data values (1,1,2);
insert into matrix_data values (1,2,2);
insert into matrix_data values (1,3,3);
insert into matrix_data values (1,4,3);
insert into matrix_data values (1,5,4);
insert into matrix_data values (2,1,4);
insert into matrix_data values (2,2,4);
insert into matrix_data values (2,3,5);
insert into matrix_data values (2,4,6);
insert into matrix_data values (3,1,3);
insert into matrix_data values (3,2,2);
insert into matrix_data values (3,3,2);
insert into matrix_data values (4,1,2);
insert into matrix_data values (4,2,1);
insert into matrix_data values (5,1,1);
Can this be done?
Test setup:
CREATE TEMP TABLE matrix (
rowindex integer,
columnindex integer,
matrixvalue numeric);
INSERT INTO matrix VALUES
(1,1,2),(1,2,2),(1,3,3),(1,4,3),(1,5,4)
,(2,1,4),(2,2,4),(2,3,5),(2,4,6)
,(3,1,3),(3,2,2),(3,3,2)
,(4,1,2),(4,2,1)
,(5,1,1);
Run INSERTs in a LOOP with DO:
DO $$
BEGIN
FOR i IN 2 .. 5 LOOP
FOR j IN 7-i .. 5 LOOP
INSERT INTO matrix
VALUES (i,j, (
SELECT sum(matrixvalue)/2
FROM matrix
WHERE (rowindex, columnindex) IN ((i-1, j),(i, j-1))
));
END LOOP;
END LOOP;
END;
$$
See result:
SELECT * FROM matrix order BY 1,2;
This can be done in a single SQL select statement, but only because recursion is not necessary. I'll outline the solution. If you actually want the SQL code, let me know.
First, notice that the only items that contribute to the sums are along the diagonal. Now, if we follow the contribution of the value "4" in (1, 5), it contributes 4/2 to (2,5) and 4/4 to (3,5) and 4/8 to (4,5). Each time, the contribution is cut in half, because (a+b)/2 is (a/2 + b/2).
When we extend this, we start to see a pattern similar to Pascal's triangle. In fact, for any given point in the lower triangular matrix (below where you have values), you can find the diagonal elements that contribute to the value. Extend a vertical line up to hit the diagonal and a horizontal line to hit the diagonal. Those are the contributors from the diagonal row.
How much do they contribute? Well, for that we can go to Pascal's triangle. For the first diagonal below where we have values, the contributions are (1,1)/2. For the second diagonal, (1,2,1)/4. For the third, (1,3,3,1)/8 . . . and so on.
Fortunately, we can calculate the contributions for each value using a formula (the "choose" function from combinatorics). The power of 2 is easy. And, determining how far a given cell is from the diagonal is not too hard.
All of this can be combined into a single Postgres SQL statement. However, #Erwin's solution also works. I only want to put the effort into debugging the statement if his solution doesn't meet your needs.
... and here comes the recursive CTE with multiple embedded CTE's (tm):
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE matrix_data (
yyy integer,
xxx integer,
val numeric);
insert into matrix_data (yyy,xxx,val) values
(1,1,2) , (1,2,2) , (1,3,3) , (1,4,3) , (1,5,4)
, (2,1,4) , (2,2,4) , (2,3,5) , (2,4,6)
, (3,1,3) , (3,2,2) , (3,3,2)
, (4,1,2) , (4,2,1)
, (5,1,1)
;
WITH RECURSIVE rr AS (
WITH xx AS (
SELECT MIN(xxx) AS x0
, MAX(xxx) AS x1
FROM matrix_data
)
, mimax AS (
SELECT generate_series(xx.x0,xx.x1) AS xxx
FROM xx
)
, yy AS (
SELECT MIN(yyy) AS y0
, MAX(yyy) AS y1
FROM matrix_data
)
, mimay AS (
SELECT generate_series(yy.y0,yy.y1) AS yyy
FROM yy
)
, cart AS (
SELECT * FROM mimax mm
JOIN mimay my ON (1=1)
)
, empty AS (
SELECT * FROM cart ca
WHERE NOT EXISTS (
SELECT *
FROM matrix_data nx
WHERE nx.xxx = ca.xxx
AND nx.yyy = ca.yyy
)
)
, hot AS (
SELECT * FROM empty emp
WHERE EXISTS (
SELECT *
FROM matrix_data ex
WHERE ex.xxx = emp.xxx -1
AND ex.yyy = emp.yyy
)
AND EXISTS (
SELECT *
FROM matrix_data ex
WHERE ex.xxx = emp.xxx
AND ex.yyy = emp.yyy -1
)
)
-- UPDATE from here:
SELECT h.xxx,h.yyy, md.val / 2 AS val
FROM hot h
JOIN matrix_data md ON
(md.yyy = h.yyy AND md.xxx = h.xxx-1)
OR (md.yyy = h.yyy-1 AND md.xxx = h.xxx)
UNION ALL
SELECT e.xxx,e.yyy, r.val / 2 AS val
FROM empty e
JOIN rr r ON ( e.xxx = r.xxx+1 AND e.yyy = r.yyy)
OR ( e.xxx = r.xxx AND e.yyy = r.yyy+1 )
)
INSERT INTO matrix_data(yyy,xxx,val)
SELECT DISTINCT yyy,xxx
,SUM(val)
FROM rr
GROUP BY yyy,xxx
;
SELECT * FROM matrix_data
;
New result:
NOTICE: drop cascades to table tmp.matrix_data
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 15
INSERT 0 10
yyy | xxx | val
-----+-----+------------------------
1 | 1 | 2
1 | 2 | 2
1 | 3 | 3
1 | 4 | 3
1 | 5 | 4
2 | 1 | 4
2 | 2 | 4
2 | 3 | 5
2 | 4 | 6
3 | 1 | 3
3 | 2 | 2
3 | 3 | 2
4 | 1 | 2
4 | 2 | 1
5 | 1 | 1
2 | 5 | 5.0000000000000000
5 | 5 | 2.81250000000000000000
4 | 3 | 1.50000000000000000000
3 | 5 | 4.50000000000000000000
5 | 2 | 1.00000000000000000000
3 | 4 | 4.00000000000000000000
5 | 3 | 1.25000000000000000000
4 | 5 | 3.62500000000000000000
4 | 4 | 2.75000000000000000000
5 | 4 | 2.00000000000000000000
(25 rows)
while (select max(ColumnIndex+RowIndex) from matrix_data)<10
begin
insert matrix_data
select c1.RowIndex, c1.ColumnIndex+1, (c1.MatrixValue+c2.MatrixValue)/2
from matrix_data c1
inner join
matrix_data c2
on c1.ColumnIndex+1=c2.ColumnIndex and c1.RowIndex-1 = c2.RowIndex
where c1.RowIndex+c1.ColumnIndex=(select max(RowIndex+ColumnIndex) from matrix_data)
and c1.ColumnIndex<5
end