Use WITH to loop over a set of data in SQL - sql

Given the following fields below, I'm trying to loop to the first iteration of the total set of iterations.
+-------------------+----------------------+------------------------+
| id | nextiterationId | iterationCount |
+-------------------+----------------------+------------------------+
| 110001 | 110002 | 0 |
| 110002 | 110003 | 1 |
| 110003 | 110004 | 2 |
| 110004 | 1 | 3 |
So if I call a SP/function using one of the values of the id field, I need it return the prior iterations of the id given until iterationCount = 0.
So If I use id of 110003(send that as the parameter), the first thing it should query is an id field having a nextIterationID of 110003. That would be the first loop.
Since the iterationCount is not 0 yet, it would keep looping. Then it would look for an id where nextIterationID is 110002 based on first loop determination, so second loop will find "id" of 110001 and return it. And since that record iterationCount = 0, it would stop the loop.
It's okay if I call the SP/function using 110003, which is the 3rd iteration, and it would not return the 110004, 4th iteration. I only need it to go back given the id.
A while ago I did this using a WITH and maybe WHILE using both somehow, but I can't recall how to do this now. I need the format returned in a way so that I can use it in a larger SELECT statements.

Here is a recursive cte solution. Let me know if it needs any tweaks.
--Throwing your table into a temp table
CREATE TABLE #yourTable (ID INT,NextIterationID INT,IterationCount INT)
INSERT INTO #yourTable
VALUES
(110001,110002,0),
(110002,110003,1),
(110003,110004,2),
(110004,1,3)
--Now for the actual work
--Here is your parameter
DECLARE #param INT = 110003;
--Recursive CTE
WITH yourCTE
AS
(
--Initial Row
SELECT ID,
NextIterationID,
IterationCount
FROM #yourTable
WHERE NextIterationID = #param
UNION ALL
--Finding all previous iterations
SELECT #yourTable.*
FROM #yourTable
INNER JOIN yourCTE
ON yourcte.ID = #yourTable.NextIterationID
--Where clause is not really necessary because once there are no more previous iterations, it will automatically stop
--WHERE yourCTE.IterationCount >= 0
)
SELECT *
FROM yourCTE
--Cleanup
DROP TABLE #yourTable
Results:
ID NextIterationID IterationCount
----------- --------------- --------------
110002 110003 1
110001 110002 0

Related

SQL - Set column value to the SUM of all references

I want to have the column "CurrentCapacity" to be the SUM of all references specific column.
Lets say there are three rows in SecTable which all have FirstTableID = 1. Size values are 1, 1 and 3.
The row in FirstTable which have ID = 1 should now have a value of 5 in the CurrentCapacity column.
How can I make this and how to do automatically on insert, update and delete?
Thanks!
FirstTable
+----+-------------+-------------------------+
| ID | MaxCapacity | CurrentCapacity |
+----+-------------+-------------------------+
| 1 | 5 | 0 (desired result = 5) |
+----+-------------+-------------------------+
| 2 | 5 | 0 |
+----+-------------+-------------------------+
| 3 | 5 | 0 |
+----+-------------+-------------------------+
SecTable
+----+-------------------+------+
| ID | FirstTableID (FK) | Size |
+----+-------------------+------+
| 1 | 1 | 2 |
+----+-------------------+------+
| 2 | 1 | 3 |
+----+-------------------+------+
In general, a view is a better solution than trying to keep a calculated column up-to-date. For your example, you could use this:
CREATE VIEW capacity AS
SELECT f.ID, f.MaxCapacity, COALESCE(SUM(s.Size), 0) AS CurrentCapacity
FROM FirstTable f
LEFT JOIN SecTable s ON s.FirstTableID = f.ID
GROUP BY f.ID, f.MaxCapacity
Then you can simply
SELECT *
FROM capacity
to get the results you desire. For your sample data:
ID MaxCapacity CurrentCapacity
1 5 5
2 5 0
3 5 0
Demo on SQLFiddle
Got this question to work with this trigger:
CREATE TRIGGER UpdateCurrentCapacity
ON SecTable
AFTER INSERT, UPDATE, DELETE
AS
BEGIN
SET NOCOUNT ON
DECLARE #Iteration INT
SET #Iteration = 1
WHILE #Iteration <= 100
BEGIN
UPDATE FirstTable SET FirstTable.CurrentCapacity = (SELECT COALESCE(SUM(SecTable.Size),0) FROM SecTable WHERE FirstTableID = #Iteration) WHERE ID = #Iteration;
SET #Iteration = #Iteration + 1
END
END
GO
Personally, I would not use a trigger either or store CurrentCapacity as a value since it breaks Normalization rules for database design. You have a relation and can already get the results by creating a view or setting CurrentCapacity to a calculated column.
Your view can look like this:
SELECT Id, MaxCapacity, ISNULL(O.SumSize,0) AS CurrentCapacity
FROM dbo.FirstTable FT
OUTER APPLY
(
SELECT ST.FirstTableId, SUM(ST.Size) as SumSize FROM SecTable ST
WHERE ST.FirstTableId = FT.Id
GROUP BY ST.FirstTableId
) O
Sure, you could fire a proc every time a row is updated/inserted or deleted in the second table and recalculate the column, but you might as well calculate it on the fly. If it's not required to have the column accurate, you can have a job update the values every X hours. You could combine this with your view to have both a "live" and "cached" version of the capacity data.

SQL: insert to unique column with shift

I need to store data with varchar name and Integer intValue. All integer values are unique and I need to keep up that contract
I need to write query to add the element using the following rule: if after insertion there is an intValue duplication - we need to increase intValue of existed element to resolve conflict. Repeat that operation until no conflict left.
Example:
B | 2 | | B | 2 |
C | 3 | | E | 3 |
D | 4 | => insert (E 3) => | C | 4 |
A | 1 | | D | 5 |
Z | 7 | | A | 1 |
| Z | 7 |
The only idea is to run update query in a loop but that looks too unefficient.
I need to write this query in Spring JPA, so the only requirement that the query should not be database specific
Business case:
Let's say there is a people in the queue. And intValue is position in the queue. So, "Add" means that some person come, pay money and say: I dont wanna be the last in the queue. I want to be, for example, the 3rd. So you take the money and put that person in a queue so other people after him - increments their position.
The only difference from the queue - that in my case there are gaps allowed
Aha, we might say that the gaps are occasioned by people leaving the queue.
Lets try this. Loops are inevitable--either server does them, or we can do as SQL.
-- prepare test data
declare #PeopleQueue table (pqname varchar(100), intValue int);
insert into #PeopleQueue
SELECT 'B' AS pqname, 2 as intValue UNION ALL
SELECT 'C' AS pqname, 3 as intValue UNION ALL
SELECT 'D' AS pqname, 4 as intValue UNION ALL
SELECT 'A' AS pqname, 1 as intValue UNION ALL
SELECT 'Z' AS pqname, 7 as intValue
;
--SELECT '' AS pqname, 0 as intValue UNION ALL
Select * from #PeopleQueue; - verify good test data
-- Solve the problem
Declare #pqnameNEW varchar(100) = 'E';
Declare #intNEW int = 3; -- 3 for conflict, or for no conflict, use 13
Declare #intHIGH int;
IF EXISTS ( SELECT 1 FROM #PeopleQueue WHERE intValue = #intNEW )
BEGIN
-- find the end of the sequence, before the gap
SET #intHIGH = (
SELECT TOP 1
intValue
FROM #PeopleQueue pq
WHERE NOT EXISTS
(
SELECT NULL
FROM #PeopleQueue pn
WHERE pn.intValue = pq.intValue + 1
)
AND pq.intValue >= #intNEW
)
;
-- now Update all from intNEW thru intHIGH
UPDATE #PeopleQueue
SET intValue = intValue + 1
WHERE intValue >= #intNEW
AND intValue <= #intHIGH
End;
-- finally insert the new item
INSERT into #PeopleQueue Values (#pqnameNEW, #intNEW);
Select * from #PeopleQueue; -- verify correct solution
Edited--11/28 17:00
Or, estimate the number of Bump-the-Line-Inserts (vs append to the end inserts), and design the intValues to be originally in multiples of ten (10) so that long sequences of updates are minimized.
update queue
SET intValue = intValue + 1
WHERE intValue >= 3
AND intValue <= (
SELECT q1.intValue
FROM queue as q1 LEFT JOIN queue AS q2 ON q1.intValue + 1 = q2.intValue
WHERE q2.name is NULL AND q1.intValue > 3
ORDER BY q1.intValue
LIMIT 1
)

How to get IDs of my batch update SQL Server

How can I get the IDs of affected rows on my batch update? As I'm trying to insert on table tbl.history of all the update/transactions.
Below is my sample table:
table tbl.myTable
+------+-----------+------------+
| ID | Amount | Date |
+------+-----------+------------+
| 1 | 100 | 01/01/2019 |
+------+-----------+------------+
| 2 | 200 | 01/02/2019 |
+------+-----------+------------+
| 3 | 500 | 01/01/2019 |
+------+-----------+------------+
| 5 | 500 | 01/05/2019 |
+------+-----------+------------+
Here's my batch update query:
Update tbl.myTable set Amount = 0 where Date = '01/01/2019'
with the query it will update/affect the two data with ID 1 and 3. How can I get those ID to insert it in another table (tbl.history)?
Use the OUTPUT clause. It provides you with a "table" named deleted which contains the values before the update, and a "table" named inserted which contains the new values.
So, you can run
Update tbl.myTable set Amount = 0
output inserted.*,deleted.*
where Date = '01/01/2019'
To understand how it works, succeeding this, you can now create a temporary table and OUTPUT the fields you want INTO it:
Update tbl.myTable set Amount = 0
output inserted.*,deleted.* into temp_table_with_updated
where Date = '01/01/2019'
You can do this by using OUTPUT
declare #outputIDs as TABLE
(
ID int
)
Update tbl.MyTable Set [Amount] = 0
OUTPUT INSERTED.ID into #outputIDs
WHERE [Date] = '01/01/2019'
The #outputIDs table will have the two updated IDs.
Use a caching mechanism (table variable, cte etc)
declare #temp table (id int)
insert into #temp select id from tbl.myTable where Date = '01/01/2019'
update tbl.myTable set Amount=0 where id in (select id from #temp)
-- do more stuff with the id's

Finding & updating duplicate rows

I need to implement a query (or maybe a stored procedure) that will perform soft de-duplication of data in one of my tables. If any two records are similar enough, I need to "squash" them: deactivate one and update another.
The similarity is based on a score. Score is calculated the following way:
from both records, take values of column A,
values equal? add A1 to the score,
values not equal? subtract A2 from the score,
move on to the next column.
As soon as all desired value pairs checked:
is resulting score more then X?
yes – records are duplicate, mark older record as "duplicate"; append its id to a duplicate_ids column to the newer record.
no – do nothing.
How would I approach solving this task in SQL?
The table in question is called people. People records are entered by different admins. The de-duplication process exists to make sure no two same people exists in the system.
The motivation for the task is simple: performance.
Right now the solution is implemented in scripting language via several sub-par SQL queries and logic on top of them. However, the volume of data is expected to grow to tens of millions of records, and script will eventually become very slow (it should run via cron every night).
I'm using postgresql.
It appears that the de-duplication is generally a tough problem.
I found this: https://github.com/dedupeio/dedupe. There's a good description of how this works: https://dedupe.io/documentation/how-it-works.html.
I'm going to explore dedupe. I'm not going to try to implement it in SQL.
If I get you correctly, this could help.
You can use PostgreSQL Window Functions to get all the duplicates and use "weights" to determine which records are duplicated so you can do whatever you like with them.
Here is an example:
-- Temporal table for the test, primary key is id and
-- we have A,B,C columns with a creation date:
CREATE TEMP TABLE test
(id serial, "colA" text, "colB" text, "colC" text,creation_date date);
-- Insert test data:
INSERT INTO test ("colA", "colB", "colC",creation_date) VALUES
('A','B','C','2017-05-01'),('D','E','F','2017-06-01'),('A','B','D','2017-08-01'),
('A','B','R','2017-09-01'),('C','J','K','2017-09-01'),('A','C','J','2017-10-01'),
('C','W','K','2017-10-01'),('R','T','Y','2017-11-01');
-- SELECT * FROM test
-- id | colA | colB | colC | creation_date
-- ----+-------+-------+-------+---------------
-- 1 | A | B | C | 2017-05-01
-- 2 | D | E | F | 2017-06-01
-- 3 | A | B | D | 2017-08-01 <-- Duplicate A,B
-- 4 | A | B | R | 2017-09-01 <-- Duplicate A,B
-- 5 | C | J | K | 2017-09-01
-- 6 | A | C | J | 2017-10-01
-- 7 | C | W | K | 2017-10-01 <-- Duplicate C,K
-- 8 | R | T | Y | 2017-11-01
-- Here is the query you can use to get the id's from the duplicate records
-- (the comments are backwards):
-- third, you select the id of the duplicates
SELECT id
FROM
(
-- Second, select all the columns needed and weight the duplicates.
-- You don't need to select every column, if only the id is needed
-- then you can only select the id
-- Query this SQL to see results:
SELECT
id,"colA", "colB", "colC",creation_date,
-- The weights are simple, if the row count is more than 1 then assign 1,
-- if the row count is 1 then assign 0, sum all and you have a
-- total weight of 'duplicity'.
CASE WHEN "num_colA">1 THEN 1 ELSE 0 END +
CASE WHEN "num_colB">1 THEN 1 ELSE 0 END +
CASE WHEN "num_colC">1 THEN 1 ELSE 0 END as weight
FROM
(
-- First, select using window functions and assign a row number.
-- You can run this query separately to see results
SELECT *,
-- NOTE that it is order by id, if needed you can order by creation_date instead
row_number() OVER(PARTITION BY "colA" ORDER BY id) as "num_colA",
row_number() OVER(PARTITION BY "colB" ORDER BY id) as "num_colB",
row_number() OVER(PARTITION BY "colC" ORDER BY id) as "num_colC"
FROM test ORDER BY id
) count_column_duplicates
) duplicates
-- HERE IS DEFINED WHICH WEIGHT TO SELECT, for the test,
-- id defined the ones that are more than 1
WHERE weight>1
-- The total SQL returns all the duplicates acording to the selected weight:
-- id
-- ----
-- 3
-- 4
-- 7
You can add this query to a stored procedure so you can run it whenever you like. Hope it helps.

Reducing values in one table until reserves depleted in another - recursion?

I have two tables - let's call them dbo.ValuesToReduce and dbo.Reserve
The data in the first table (dbo.ValuesToReduce) is:
ValuesToReduceId | PartnerId | Value
-------------------------------------
1 | 1 | 53.15
2 | 2 | 601.98
3 | 1 | 91.05
4 | 2 | 44.56
5 | 3 | 19.11
The second table (dbo.Reserve) looks like this
ReserveId | PartnerId | Value
-------------------------------
1 | 1 | -101.55
2 | 2 | -425.19
3 | 3 | -28.17
What I need to do is: update the Values in ValuesToReduce table using the latter table of Reserves, reducing the numbers until the reserve supply is exhausted. Here's what I should get after running the script:
ValuesToReduceId | PartnerId | Value
-------------------------------------
1 | 1 | 0.00
2 | 2 | 176.79
3 | 1 | 42.65
4 | 2 | 44.56
5 | 3 | 0.00
ReserveId | PartnerId | Value
-------------------------------
1 | 1 | 0.00
2 | 2 | 0.00
3 | 3 | -9.06
So basically, every partner has a "reserve" which he can deplete, and values in the value table should be reduced by partner accordingly if there is still something in the reserves. Reserves should be collocated in the order provided by ValuesToReduceId.
For partner with PartnerId of 1, you can see that he had enough reserve to update his first value to 0 and still had some left to reduce the second value by that amount.
Partner with ID of 2 had a reserve of 425.19, and there were two entries in the values table for that partner, 601.98 and 44.56, in that order (by ValuesToReduceId), so we only updated the first value since the reserve is not big enough for both. The wrong way would have been to update the second value to 0.00 and the first to 221.35.
Partner with ID of 3 has more than enough reserve, so after updating his value to 0, he's left with -9.06
I tried something with recursive cte, but I can't seem to get my head around it.
Hope I described the problem clearly enough..
You cannot, as far as I know, update two tables in a single select statement.
But you could do this in SQL using a WHILE loop. Search for the first transaction, then carry it out, until there are no possible transactions left.
declare #valid int
declare #resid int
declare #val float
while 1 = 1
begin
select top 1
#resid = r.ReserveId
, #valid = v.ValuesToReduceId
, #val = CASE WHEN -r.Value > v.Value THEN v.Value ELSE -r.Value END
from ValuesToReduce v
inner join Reserves r on r.PartnerId = v.PartnerId
where r.Value < 0 and v.Value > 0
order by r.ReserveId
if ##rowcount = 0
break
update ValuesToReduce
set Value = Value - #val
where ValuesToReduceId = #valid
update Reserves
set Value = Value + #val
where ReserveId = #resid
end
Here's code to create test tables:
create table ValuesToReduce (
ValuesToReduceId int,
PartnerId int,
Value float
)
insert into ValuesToReduce values (1,1,53.15)
insert into ValuesToReduce values (2,2,601.98)
insert into ValuesToReduce values (3,1,91.05)
insert into ValuesToReduce values (4,2,44.56)
insert into ValuesToReduce values (5,3,19.11)
create table Reserves (
ReserveId int,
PartnerId int,
Value float
)
insert into Reserves values (1,1,-101.55)
insert into Reserves values (2,2,-425.19)
insert into Reserves values (3,3,-28.17)