Optimal query to fetch a cumulative sum in MySQL - sql
What is 'correct' query to fetch a cumulative sum in MySQL?
I've a table where I keep information about files, one column list contains the size of the files in bytes. (the actual files are kept on disk somewhere)
I would like to get the cumulative file size like this:
+------------+---------+--------+----------------+
| fileInfoId | groupId | size | cumulativeSize |
+------------+---------+--------+----------------+
| 1 | 1 | 522120 | 522120 |
| 2 | 2 | 316042 | 316042 |
| 4 | 2 | 711084 | 1027126 |
| 5 | 2 | 697002 | 1724128 |
| 6 | 2 | 663425 | 2387553 |
| 7 | 2 | 739553 | 3127106 |
| 8 | 2 | 700938 | 3828044 |
| 9 | 2 | 695614 | 4523658 |
| 10 | 2 | 744204 | 5267862 |
| 11 | 2 | 609022 | 5876884 |
| ... | ... | ... | ... |
+------------+---------+--------+----------------+
20000 rows in set (19.2161 sec.)
Right now, I use the following query to get the above results
SELECT
a.fileInfoId
, a.groupId
, a.size
, SUM(b.size) AS cumulativeSize
FROM fileInfo AS a
LEFT JOIN fileInfo AS b USING(groupId)
WHERE a.fileInfoId >= b.fileInfoId
GROUP BY a.fileInfoId
ORDER BY a.groupId, a.fileInfoId
My solution is however, extremely slow. (around 19 seconds without cache).
Explain gives the following execution details
+----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+
| 1 | SIMPLE | a | index | PRIMARY,foreignId | PRIMARY | 4 | NULL | 14905 | |
| 1 | SIMPLE | b | ref | PRIMARY,foreignId | foreignId | 4 | db.a.foreignId | 36 | Using where |
+----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+
My question is:
How can I optimize the above query?
Update
I've updated the question as to provide the table structure and a procedure to fill the table with 20,000 records test data.
CREATE TABLE `fileInfo` (
`fileInfoId` int(10) unsigned NOT NULL AUTO_INCREMENT
, `groupId` int(10) unsigned NOT NULL
, `name` varchar(128) NOT NULL
, `size` int(10) unsigned NOT NULL
, PRIMARY KEY (`fileInfoId`)
, KEY `groupId` (`groupId`)
) ENGINE=InnoDB;
delimiter $$
DROP PROCEDURE IF EXISTS autofill$$
CREATE PROCEDURE autofill()
BEGIN
DECLARE i INT DEFAULT 0;
DECLARE gid INT DEFAULT 0;
DECLARE nam char(20);
DECLARE siz INT DEFAULT 0;
WHILE i < 20000 DO
SET gid = FLOOR(RAND() * 250);
SET nam = CONV(FLOOR(RAND() * 10000000000000), 20, 36);
SET siz = FLOOR((RAND() * 1024 * 1024));
INSERT INTO `fileInfo` (`groupId`, `name`, `size`) VALUES(gid, nam, siz);
SET i = i + 1;
END WHILE;
END;$$
delimiter ;
CALL autofill();
About the possible duplicate question
The question linked by Forgotten Semicolon is not the same question. My question has extra column. because of this extra groupId column, the accepted answer there does not work for my problem. (maybe it can be adapted to work, but I don't know how, hence my question)
You could use a variable - it's far quicker than any join:
SELECT
id,
size,
#total := #total + size AS cumulativeSize,
FROM table, (SELECT #total:=0) AS t;
Here's a quick test case on a Pentium III with 128MB RAM running Debian 5.0:
Create the table:
DROP TABLE IF EXISTS `table1`;
CREATE TABLE `table1` (
`id` int(11) NOT NULL auto_increment,
`size` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
Fill with 20,000 random numbers:
DELIMITER //
DROP PROCEDURE IF EXISTS autofill//
CREATE PROCEDURE autofill()
BEGIN
DECLARE i INT DEFAULT 0;
WHILE i < 20000 DO
INSERT INTO table1 (size) VALUES (FLOOR((RAND() * 1000)));
SET i = i + 1;
END WHILE;
END;
//
DELIMITER ;
CALL autofill();
Check the row count:
SELECT COUNT(*) FROM table1;
+----------+
| COUNT(*) |
+----------+
| 20000 |
+----------+
Run the cumulative total query:
SELECT
id,
size,
#total := #total + size AS cumulativeSize
FROM table1, (SELECT #total:=0) AS t;
+-------+------+----------------+
| id | size | cumulativeSize |
+-------+------+----------------+
| 1 | 226 | 226 |
| 2 | 869 | 1095 |
| 3 | 668 | 1763 |
| 4 | 733 | 2496 |
...
| 19997 | 966 | 10004741 |
| 19998 | 522 | 10005263 |
| 19999 | 713 | 10005976 |
| 20000 | 0 | 10005976 |
+-------+------+----------------+
20000 rows in set (0.07 sec)
UPDATE
I'd missed the grouping by groupId in the original question, and that certainly made things a bit trickier. I then wrote a solution which used a temporary table, but I didn't like it—it was messy and overly complicated. I went away and did some more research, and have come up with something far simpler and faster.
I can't claim all the credit for this—in fact, I can barely claim any at all, as it is just a modified version of Emulate row number from Common MySQL Queries.
It's beautifully simple, elegant, and very quick:
SELECT fileInfoId, groupId, name, size, cumulativeSize
FROM (
SELECT
fileInfoId,
groupId,
name,
size,
#cs := IF(#prev_groupId = groupId, #cs+size, size) AS cumulativeSize,
#prev_groupId := groupId AS prev_groupId
FROM fileInfo, (SELECT #prev_groupId:=0, #cs:=0) AS vars
ORDER BY groupId
) AS tmp;
You can remove the outer SELECT ... AS tmp if you don't mind the prev_groupID column being returned. I found that it ran marginally faster without it.
Here's a simple test case:
INSERT INTO `fileInfo` VALUES
( 1, 3, 'name0', '10'),
( 5, 3, 'name1', '10'),
( 7, 3, 'name2', '10'),
( 8, 1, 'name3', '10'),
( 9, 1, 'name4', '10'),
(10, 2, 'name5', '10'),
(12, 4, 'name6', '10'),
(20, 4, 'name7', '10'),
(21, 4, 'name8', '10'),
(25, 5, 'name9', '10');
SELECT fileInfoId, groupId, name, size, cumulativeSize
FROM (
SELECT
fileInfoId,
groupId,
name,
size,
#cs := IF(#prev_groupId = groupId, #cs+size, size) AS cumulativeSize,
#prev_groupId := groupId AS prev_groupId
FROM fileInfo, (SELECT #prev_groupId := 0, #cs := 0) AS vars
ORDER BY groupId
) AS tmp;
+------------+---------+-------+------+----------------+
| fileInfoId | groupId | name | size | cumulativeSize |
+------------+---------+-------+------+----------------+
| 8 | 1 | name3 | 10 | 10 |
| 9 | 1 | name4 | 10 | 20 |
| 10 | 2 | name5 | 10 | 10 |
| 1 | 3 | name0 | 10 | 10 |
| 5 | 3 | name1 | 10 | 20 |
| 7 | 3 | name2 | 10 | 30 |
| 12 | 4 | name6 | 10 | 10 |
| 20 | 4 | name7 | 10 | 20 |
| 21 | 4 | name8 | 10 | 30 |
| 25 | 5 | name9 | 10 | 10 |
+------------+---------+-------+------+----------------+
Here's a sample of the last few rows from a 20,000 row table:
| 19481 | 248 | 8CSLJX22RCO | 1037469 | 51270389 |
| 19486 | 248 | 1IYGJ1UVCQE | 937150 | 52207539 |
| 19817 | 248 | 3FBU3EUSE1G | 616614 | 52824153 |
| 19871 | 248 | 4N19QB7PYT | 153031 | 52977184 |
| 132 | 249 | 3NP9UGMTRTD | 828073 | 828073 |
| 275 | 249 | 86RJM39K72K | 860323 | 1688396 |
| 802 | 249 | 16Z9XADLBFI | 623030 | 2311426 |
...
| 19661 | 249 | ADZXKQUI0O3 | 837213 | 39856277 |
| 19870 | 249 | 9AVRTI3QK6I | 331342 | 40187619 |
| 19972 | 249 | 1MTAEE3LLEM | 1027714 | 41215333 |
+------------+---------+-------------+---------+----------------+
20000 rows in set (0.31 sec)
I think that MySQL is only using one of the indexes on the table. In this case, it's choosing the index on foreignId.
Add a covering compound index that includes both primaryId and foreignId.
Related
How to insert records based on another table value
I have the following three tables: Permission | PermissionId | PermissionName | +--------------+----------------+ | 1 | A | | 2 | B | | 3 | C | | 100 | D | Group | GroupId | GroupLevel | GroupName | +---------+------------+----------------------+ | 1 | 0 | System Administrator | | 7 | 0 | Test Group 100 | | 8 | 20 | Test Group 200 | | 9 | 20 | test | | 10 | 50 | TestGroup01 | | 11 | 51 | TestUser02 | | 12 | 52 | TestUser03 | GroupPermission | GroupPermissionId | FkGroupId | FkPermissionId | +-------------------+-----------+----------------+ | 1 | 1 | 1 | | 2 | 1 | 2 | | 3 | 1 | 3 | | 4 | 1 | 4 | I need to insert records into GroupPermission table, if table Group, GroupLevel column have 0 then I need to take its GroupId and need to insert values to GroupPermission table as that particular id and 100. In order to above sample table records, I need to insert the following two records to GroupPermission table, | FkGroupId | FkPermissionId | +-----------+----------------+ | 1 | 100 | | 7 | 100 | How can I do it
This question is not very clear and I can only assume the value 100 is a static value and that you don't actually have foreign keys as the names of the columns imply. Also, you really should avoid reserved words like "Group" for object names. It makes things more difficult and confusing. The simple version of your insert might look like this. insert GroupPermission ( FkGroupId , FkPermissionId ) select g.GroupId , 100 from [Group] g where g.GroupLevel = 0 --EDIT-- Since you want to only insert those rows that don't already exist you can use NOT EXISTS like this. select g.GroupId , 100 from [Group] g where g.GroupLevel = 0 AND NOT EXISTS ( select * from GroupPermission gp where gp.FkGroupId = g.GroupId and g.FkPermissionId = 100 ) Or you could use a left join like this. select g.GroupId , 100 from [Group] g left join GroupPermission gp on gp.FkGroupId = g.GroupId and gp.FkPermissionId = 100 where g.GroupLevel = 0 and gp.FkGroupId is null
T-SQL return individual values instead of cumulative value
I have a 1 table in a db that stored Incoming, Outgoing and Net values for various Account Codes over time. Although there is a date field the sequence of events per Account Code is based on the "Version" number where 0 = original record for each Account Code and it increments by 1 after each change to that Account Code. The Outgoing and Incoming values are stored in the db as cumulative values rather than the individual transaction value but I am looking for a way to Select * From this table and return the individual amounts as opposed to the cumulative. Below are test scripts of table and data, and also 2 examples. If i Select where code = '123' in the test table I currently get this (values are cumulative); +------+------------+---------+---------+---------+-----+ | Code | Date | Version | Incoming| Outgoing| Net | +------+------------+---------+---------+---------+-----+ | 123 | 01/01/2018 | 0 | 100 | 0 | 100 | | 123 | 07/01/2018 | 1 | 150 | 0 | 150 | | 123 | 09/01/2018 | 2 | 150 | 100 | 50 | | 123 | 14/01/2018 | 3 | 200 | 100 | 100 | | 123 | 18/01/2018 | 4 | 200 | 175 | 25 | | 123 | 23/01/2018 | 5 | 225 | 175 | 50 | | 123 | 30/01/2018 | 6 | 225 | 225 | 0 | +------+------------+---------+---------+---------+-----+ This is what I would like to see (each individual transaction); +------+------------+---------+----------+----------+------+ | Code | Date | Version | Incoming | Outgoing | Net | +------+------------+---------+----------+----------+------+ | 123 | 01/01/2018 | 0 | 100 | 0 | 100 | | 123 | 07/01/2018 | 1 | 50 | 0 | 50 | | 123 | 09/01/2018 | 2 | 0 | 100 | -100 | | 123 | 14/01/2018 | 3 | 50 | 0 | 50 | | 123 | 18/01/2018 | 4 | 0 | 75 | -75 | | 123 | 23/01/2018 | 5 | 25 | 0 | 25 | | 123 | 30/01/2018 | 6 | 0 | 50 | -50 | +------+------------+---------+----------+----------+------+ If I had the individual transaction values and wanted to report on the cumulative, I would use an OVER PARTITION BY, but is there an opposite to that? I am not looking to redesign the create table or the process in which it is stored, I am just looking for a way to report on this from our MI environment. Note: I've added other random Account Codes into this to emphasis how the data is not ordered by Code or Version, but by Date. thanks in advance for any help. USE [tempdb]; IF EXISTS ( SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'Table1' AND TABLE_SCHEMA = 'dbo') DROP TABLE [dbo].[Table1]; GO CREATE TABLE [dbo].[Table1] ( [Code] CHAR(3) ,[Date] DATE ,[Version] CHAR(3) ,[Incoming] DECIMAL(20,2) ,[Outgoing] DECIMAL(20,2) ,[Net] DECIMAL(20,2) ); GO INSERT INTO [dbo].[Table1] VALUES ('123','2018-01-01','0','100','0','100'), ('456','2018-01-02','0','50','0','50'), ('789','2018-01-03','0','0','0','0'), ('456','2018-01-04','1','100','0','100'), ('456','2018-01-05','2','150','0','150'), ('789','2018-01-06','1','50','50','0'), ('123','2018-01-07','1','150','0','150'), ('456','2018-01-08','3','200','0','200'), ('123','2018-01-09','2','150','100','50'), ('789','2018-01-10','2','0','0','0'), ('456','2018-01-11','4','225','0','225'), ('789','2018-01-12','3','75','25','50'), ('987','2018-01-13','0','0','50','-50'), ('123','2018-01-14','3','200','100','100'), ('654','2018-01-15','0','100','0','100'), ('456','2018-01-16','5','250','0','250'), ('987','2018-01-17','1','50','50','0'), ('123','2018-01-18','4','200','175','25'), ('789','2018-01-19','4','100','25','75'), ('987','2018-01-20','2','150','125','25'), ('321','2018-01-21','0','100','0','100'), ('654','2018-01-22','1','0','0','0'), ('123','2018-01-23','5','225','175','50'), ('321','2018-01-24','1','100','50','50'), ('789','2018-01-25','5','100','50','50'), ('987','2018-01-26','3','150','150','0'), ('456','2018-01-27','6','250','250','0'), ('456','2018-01-28','7','270','250','20'), ('321','2018-01-29','2','100','100','0'), ('123','2018-01-30','6','225','225','0'), ('987','2018-01-31','4','175','150','25') ; GO SELECT * FROM [dbo].[Table1] WHERE [Code] = '123' GO; USE [tempdb]; IF EXISTS ( SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'Table1' AND TABLE_SCHEMA = 'dbo') DROP TABLE [dbo].[Table1]; GO; }
Just use lag(): select Evt, Date, Version, (Loss - lag(Loss, 1, 0) over (partition by evt order by date)) as incoming, (Rec - lag(Rec, 1, 0) over (partition by evt order by date)) as outgoing, (Net - lag(Net, 1, 0) over (partition by evt order by date)) as net from [dbo].[Table1];
SQL Query to count number of records must match total number of records
I have 2 tables as Result Master +------+-------------+ | QnID | Description | +------+-------------+ | 1 | Qn1 | | 2 | Qn2 | | 3 | Qn3 | | 4 | Qn4 | | 5 | Qn5 | +------+-------------+ Result Details +----+------+--------+--------+ | ID | QnID | TCDesc | Result | +----+------+--------+--------+ | 1 | 1 | TC1 | PASS | | 2 | 1 | TC2 | FAIL | | 3 | 1 | TC3 | PASS | | 4 | 2 | TC1 | PASS | | 5 | 3 | TC1 | PASS | | 6 | 3 | TC1 | PASS | | 7 | 3 | TC3 | PASS | +----+------+--------+--------+ I need a query which will return following result: +----+------+--------+ | ID | QnID | Result | +----+------+--------+ | 1 | 2 | PASS | | 2 | 3 | PASS | | 3 | 4 | ERROR | | 4 | 5 | ERROR | +----+------+--------+ Conditions: each question will have different number of testcase "ResultDetails", I need to select questions for which all the test case get passsed (number of entries for a particular question must be same as number of test cases passed for the same) or Error (ResultDetail doesn't have an entry for a question). Can anyone please help me with a query, thank you.
You can get the desired results using a common table expression and conditional aggregation. First, create and populate sample tables (Please save us this step in your future questions): DECLARE #ResultMaster AS TABLE ( QnID int, Description char(3) ); INSERT INTO #ResultMaster (QnID, Description) VALUES (1, 'Qn1'), (2, 'Qn2'), (3, 'Qn3'), (4, 'Qn4'), (5, 'Qn5'); DECLARE #ResultDetails AS TABLE ( ID int, QnID int, TCDesc char(3), Result char(4) ); INSERT INTO #ResultDetails VALUES (1, 1, 'TC1', 'PASS'), (2, 1, 'TC2', 'FAIL'), (3, 1, 'TC3', 'PASS'), (4, 2, 'TC1', 'PASS'), (5, 3, 'TC1', 'PASS'), (6, 3, 'TC1', 'PASS'), (7, 3, 'TC3', 'PASS'); Then, use a common table expression to calculate the number of pass details and a simple count to get the number of total details: WITH CTE AS ( SELECT M.QnId, COUNT(CASE WHEN Result = 'PASS' THEN 1 END) As CountPass, COUNT(Result) As CountDetails FROM #ResultMaster As M LEFT JOIN #ResultDetails As D ON M.QnId = D.QnId GROUP BY M.QnId ) Then, select from that cte: SELECT ROW_NUMBER() OVER(ORDER BY QnId) AS Id, QnId, CASE WHEN CountDetails = 0 THEN 'ERROR' ELSE 'PASS' END FROM CTE WHERE CountPass = CountDetails Results: +----+------+--------+ | ID | QnID | Result | +----+------+--------+ | 1 | 2 | PASS | | 2 | 3 | PASS | | 3 | 4 | ERROR | | 4 | 5 | ERROR | +----+------+--------+ You can see a live demo on rextester.
Simple algebra with recursive SQL
The following schema is used to create simple algebraic formulas. variables is used to create formulas such as x=3+4y. variables_has_sub_variables is used to combine the previous mentioned formulas and uses the sign column (will be +1 or -1 only) to determine whether the formula should be added or subtracted to the combination. For instance, variables table might have the following data where the Implied Formulas column is not really in the table but just for illustrative purposes only. variables table +-----------+-----------+-------+------------------+ | variables | intercept | slope | Implied Formula | +-----------+-----------+-------+------------------+ | 1 | 2.86 | -0.82 | Y1=+2.86-0.82*X1 | | 2 | 2.96 | -3.49 | Y2=+2.96-3.49*X2 | | 3 | 2.56 | 2.81 | Y3=+2.56+2.81*X3 | | 4 | 3.04 | -3.43 | Y4=+3.04-3.43*X4 | | 5 | -1.94 | 4.11 | Y5=-1.94+4.11*X5 | | 6 | -1.21 | -0.62 | Y6=-1.21-0.62*X6 | | 7 | 0.88 | -0.61 | Y7=+0.88-0.61*X7 | | 8 | -2.77 | -0.34 | Y8=-2.77-0.34*X8 | | 9 | 1.81 | 1.65 | Y9=+1.81+1.65*X9 | +-----------+-----------+-------+------------------+ Then, given the below variables_has_sub_variables data, the variables combined resulting in X7=+Y1-Y2+Y3, X8=+Y4+Y5-Y7, and X9=+Y6-Y7+Y8. Next Y7, Y8, and Y9 can be derived using the variables table resulting in Y7=+0.88-0.61*X7, etc. Note that the application will prevent an endless loop such as inserting a record where variables equals 7 and sub_variables equals 9 as variable 9 is based on variable 7. variables_has_sub_variables table +-----------+---------------+------+ | variables | sub_variables | sign | +-----------+---------------+------+ | 7 | 1 | 1 | | 7 | 2 | -1 | | 7 | 3 | 1 | | 8 | 4 | 1 | | 8 | 5 | 1 | | 8 | 7 | -1 | | 9 | 6 | 1 | | 9 | 7 | -1 | | 9 | 8 | 1 | +-----------+---------------+------+ My objective is given any variable (i.e. 1 to 9), determine the constants and root variables where a root variable is defined as not being in variables_has_sub_variables.variables (I can also easily a root column to variables if needed), and these root variables includes 1 through 6 using my above example data. Doing so for a root variable is easier as there are no sub_variables and is simply Y1=+2.86-0.82*X1. Doing so for variable 7 is a little trickier: Y7=+0.88-0.61*X7 =+0.88-0.61*(+Y1-Y2+Y3) =+0.88-0.61*(+(+2.86-0.82*X1)-(+2.96-3.49*X2)+( +2.56+2.81*X3)) = -0.62 + 0.50*X1 - 2.13*X2 - 1.71*X3 Now the SQL. Below is how I created the tables: CREATE DATABASE algebra; USE algebra; CREATE TABLE `variables` ( `variables` INT NOT NULL, `slope` DECIMAL(6,2) NOT NULL DEFAULT 1, `intercept` DECIMAL(6,2) NOT NULL DEFAULT 0, PRIMARY KEY (`variables`)) ENGINE = InnoDB; CREATE TABLE `variables_has_sub_variables` ( `variables` INT NOT NULL, `sub_variables` INT NOT NULL, `sign` TINYINT NOT NULL, PRIMARY KEY (`variables`, `sub_variables`), INDEX `fk_variables_has_variables_variables1_idx` (`sub_variables` ASC), INDEX `fk_variables_has_variables_variables_idx` (`variables` ASC), CONSTRAINT `fk_variables_has_variables_variables` FOREIGN KEY (`variables`) REFERENCES `variables` (`variables`) ON DELETE NO ACTION ON UPDATE NO ACTION, CONSTRAINT `fk_variables_has_variables_variables1` FOREIGN KEY (`sub_variables`) REFERENCES `variables` (`variables`) ON DELETE NO ACTION ON UPDATE NO ACTION) ENGINE = InnoDB; INSERT INTO variables(variables,intercept,slope) VALUES (1,2.86,-0.82),(2,2.96,-3.49),(3,2.56,2.81),(4,3.04,-3.43),(5,-1.94,4.11),(6,-1.21,-0.62),(7,0.88,-0.61),(8,-2.77,-0.34),(9,1.81,1.65); INSERT INTO variables_has_sub_variables(variables,sub_variables,sign) VALUES (7,1,1),(7,2,-1),(7,3,1),(8,4,1),(8,5,1),(8,7,-1),(9,6,1),(9,7,-1),(9,8,1); And now the queries. XXXX is 7, 8, and 9 for the following results. Before each query, I show my expected results. WITH RECURSIVE t AS ( SELECT v.variables, v.slope, v.intercept FROM variables v WHERE v.variables=XXXX UNION ALL SELECT v.variables, vhsv.sign*t.slope*v.slope slope, vhsv.sign*t.slope*v.intercept intercept FROM t INNER JOIN variables_has_sub_variables vhsv ON vhsv.variables=t.variables INNER JOIN variables v ON v.variables=vhsv.sub_variables ) SELECT variables, SUM(slope) constant FROM t GROUP BY variables UNION SELECT 'intercept' variables, SUM(intercept) intercept FROM t; Variable 7 Desired +-----------+----------+ | variables | constant | +-----------+----------+ | 1 | 0.50 | | 2 | -2.13 | | 3 | -1.71 | | intercept | -0.6206 | +-----------+----------+ Variable 7 Actual +-----------+----------+ | variables | constant | +-----------+----------+ | 1 | 0.50 | | 2 | -2.13 | | 3 | -1.71 | | 7 | -0.61 | | intercept | -0.61 | +-----------+----------+ 5 rows in set (0.00 sec) Variable 8 Desired +-----------+-----------+ | variables | constant | +-----------+-----------+ | 1 | 0.17 | | 2 | -0.72 | | 3 | -0.58 | | 4 | 1.17 | | 5 | -1.40 | | intercept | -3.355004 | +-----------+-----------+ Variable 8 Actual +-----------+----------+ | variables | constant | +-----------+----------+ | 1 | 0.17 | | 2 | -0.73 | | 3 | -0.59 | | 4 | 1.17 | | 5 | -1.40 | | 7 | -0.21 | | 8 | -0.34 | | intercept | -3.36 | +-----------+----------+ 8 rows in set (0.00 sec) Variable 9 Desired +-----------+------------+ | variables | constant | +-----------+------------+ | 1 | -0.54 | | 2 | 2.32 | | 3 | 1.87 | | 4 | 1.92 | | 5 | -2.31 | | 6 | -1.02 | | intercept | -4.6982666 | +-----------+------------+ Variable 9 Actual +-----------+----------+ | variables | constant | +-----------+----------+ | 1 | -0.55 | | 2 | 2.33 | | 3 | 1.88 | | 4 | 1.92 | | 5 | -2.30 | | 6 | -1.02 | | 7 | 0.67 | | 8 | -0.56 | | 9 | 1.65 | | intercept | -4.67 | +-----------+----------+ 10 rows in set (0.00 sec) All I need to do is detect which variables are not the root variables and filter them out. How should this be accomplished? In response to JNevill's answer: For v.variables of 9 +-----------+-------+-------+----------+ | variables | depth | path | constant | +-----------+-------+-------+----------+ | 1 | 3 | 9>7>1 | -0.55 | | 2 | 3 | 9>7>2 | 2.33 | | 3 | 3 | 9>7>3 | 1.88 | | 4 | 3 | 9>8>4 | 1.92 | | 5 | 3 | 9>8>5 | -2.30 | | 6 | 2 | 9>6 | -1.02 | | 7 | 2 | 9>7 | 0.67 | | 8 | 2 | 9>8 | -0.56 | | 9 | 1 | 9 | 1.65 | | intercept | 1 | 9 | -4.67 | +-----------+-------+-------+----------+ 10 rows in set (0.00 sec)
I'm not going to attempt to fully wrap my head around what you are doing, and I would agree with #RickJames up in the comments that this feels like maybe not the best use-case for a database. I too am a little obsessive though. I get it. There are couple of things that I almost always track in a recursive CTE. The "Path". If I'm going to let a query head down a rabbit hole, I want to know how it got to the end point. So I track a path so I know which primary key was selected through each iteration. In the recursive seed (top portion) I use something like SELECT CAST(id as varchar(500)) as path... and in the recursive member (bottom portion) I do something like recursiveCTE.path + '>' + id as path... The "Depth". I want to know how deep the iterations went to get to the resulting record. This is tracked by adding SELECT 1 as depth to the recursive seed and recursiveCTE + 1 as depth to the recursive member. Now I know how deep each record is. I believe number 2 will solve your issue: WITH RECURSIVE t AS ( SELECT v.variables, v.slope, v.intercept, 1 as depth FROM variables v WHERE v.variables = XXXX UNION ALL SELECT v.variables, vhsv.sign * t.slope * v.slope slope, vhsv.sign * t.slope * v.intercept intercept, t.depth + 1 FROM t INNER JOIN variables_has_sub_variables vhsv ON vhsv.variables = t.variables INNER JOIN variables v ON v.variables = vhsv.sub_variables ) SELECT variables, SUM(slope) constant FROM t WHERE depth > 1 GROUP BY variables UNION SELECT 'intercept' variables, SUM(intercept) intercept FROM t; The WHERE clause here will restrict records in your recursive result set that have a depth of 1, meaning they were brought in from the recursive seed portion of the recursive CTE (That they are a root). It wasn't clear if you required that the root be removed from your second UNION of your t CTE. If so, the same logic applies; just toss that WHERE clause on to restrict depth records of 1 While it may not be helpful here, an example of your recursive cte with PATH would be: WITH RECURSIVE t AS ( SELECT v.variables, v.slope, v.intercept, 1 as depth, CAST(v.variables as CHAR(30)) as path FROM variables v WHERE v.variables = XXXX UNION ALL SELECT v.variables, vhsv.sign * t.slope * v.slope slope, vhsv.sign * t.slope * v.intercept intercept, t.depth + 1, CONCAT(t.path,'>', v.variables) FROM t INNER JOIN variables_has_sub_variables vhsv ON vhsv.variables = t.variables INNER JOIN variables v ON v.variables = vhsv.sub_variables ) SELECT variables, SUM(slope) constant FROM t WHERE depth > 1 GROUP BY variables UNION SELECT 'intercept' variables, SUM(intercept) intercept FROM t;
Postgres 9.2 PL/pgSQL simple update in loop
I have a following table: +----+----------+----------+ | id | trail_id | position | +----+----------+----------+ | 11 | 16 | NULL | | 12 | 121 | NULL | | 15 | 121 | NULL | | 19 | 42 | NULL | | 20 | 42 | NULL | | 21 | 42 | NULL | +----+----------+----------+ And I'm looking for a simple way to update position with incrementing integers ( per parent ). So after I'm done, the table should look like this: +----+-----------+----------+ | id | trail_id | position | +----+-----------+----------+ | 11 | 16 | 1 | | 12 | 121 | 1 | | 15 | 121 | 2 | | 19 | 42 | 1 | | 20 | 42 | 2 | | 21 | 42 | 3 | +----+-----------+----------+ What I think I need, is a function, that loops over all rows for a given trail, has a simple incrementing index and updates the position column. I am a pgSQL newbie however, so I'll be glad to hear there are simpler ways to do this. The solution I'm trying right now looks like this CREATE FUNCTION fill_positions(tid integer) RETURNS integer AS $$ DECLARE pht RECORD; i INTEGER := 0; BEGIN FOR pht IN SELECT * FROM photos WHERE photos.trail_id = tid LOOP i := i + 1; UPDATE photos SET position = i WHERE id = pht.id; END LOOP; RETURN i; END; $$ LANGUAGE plpgsql; I'm pretty sure it can be cleaner and it doesn't have to use functions.
You don't need a stored function for that. You can do that with a single statement. with new_numbers as ( select id, trail_id, row_number() over (partition by trail_id order by id) as position from photos ) update photos set position = nn.position from new_numbers nn where nn.id = photos.id;