Calculating Formula with CTE tree - sql

Data
I have following partial data
id parent multiplier const
-- ------ ---------- -----
1 NULL 1.10 1.00
2 1 1.20 2.00
3 1 1.30 3.00
4 1 2.40 4.00
5 2 2.50 5.00
6 2 2.60 6.00
7 2 2.70 17.00
8 3 2.80 18.00
9 3 3.90 19.00
10 3 3.10 7.00
11 8 3.20 8.00
12 8 3.30 9.00
13 8 3.40 10.00
14 9 4.50 11.00
15 10 4.60 21.00
15 10 4.70 22.00
Which can be displayed in a tree as following
1
+-- 2
| +-- 5
| +-- 6
| +-- 7
|
+-- 3
| +-- 8
| | +-- 11
| | +-- 12
| | +-- 13
| |
| +-- 9
| | +-- 14
| |
| +-- 10
| +-- 15
| +-- 16
|
+-- 4
SQL to create the table structure and data
DECLARE #table TABLE (Id int, Parent int, multiplier decimal(6,3), Const decimal(6,3));
INSERT INTO #table
SELECT 1, NULL, 1.1, 1.00 UNION
SELECT 2, 1, 1.2, 2.00 UNION
SELECT 3, 1, 1.3, 3.00 UNION
SELECT 4, 1, 2.4, 4.00 UNION
SELECT 5, 2, 2.5, 5.00 UNION
SELECT 6, 2, 2.6, 6.00 UNION
SELECT 7, 2, 2.7, 17.00 UNION
SELECT 8, 3, 2.8, 18.00 UNION
SELECT 9, 3, 3.9, 19.00 UNION
SELECT 10, 3, 3.1, 7.00 UNION
SELECT 11, 8, 3.2, 8.00 UNION
SELECT 12, 8, 3.3, 9.00 UNION
SELECT 13, 8, 3.4, 10.00 UNION
SELECT 14, 9, 4.5, 11.00 UNION
SELECT 15, 10, 4.6, 21.00 UNION
SELECT 15, 10, 4.7, 22.00;
Problem
I need to calculate recursive aX+b formula up to the root for any node in the tree. In other words I need to calculate the formula for child node and move resulting value up to parent as x and continue calculation until I reach root.
For example calculating x=1250.00 for node 14 will be
1.10 * (1.30 *( 3.90 * (4.50 * 1250.00 + 11.00) + 19.00) + 3.00) + 1.00 = 31463.442
Currently I am doing this using CTE tree and a C# however I am not satisfied with its speed and optimization.
Question
Can I do this calculation on SQL server and just return the value? If it is possible, what is the tree depth that I can navigate with CTE?

Can I do this calculation on SQL server and just return the value?
Yes, do the recursion from the leaf node, do the calculation as you go and get the max value in the main query.
with C as
(
select T.Id,
T.Parent,
cast(T.multiplier * #x + T.Const as decimal(19, 3)) as x
from #table as T
where T.Id = 14
union all
select T.Id,
T.Parent,
cast(T.multiplier * C.x + T.Const as decimal(19, 3))
from C
inner join #table as T
on C.Parent = T.Id
)
select max(C.x) as Value
from C
option (maxrecursion 0);
If it is possible, what is the tree depth that I can navigate with
CTE?
Default is 100 but you can change that with maxrecursion. When using option (maxrecursion 0) there is no limit.
I am not satisfied with its speed and optimization.
To fix that you have to show what you actually do. The sample you have provided gets a good plan if you have a clustered primary key on Id.
It does a seek to find the anchor and seeks for each iteration.

Related

Postgresql compare two rows recursively

I want to write a query where I can find track the downgraded versions for each id.
So here is the table;
id version ts
1 3 2021-09-01 10:47:50+00
1 5 2021-09-05 10:47:50+00
1 1 2021-09-11 10:47:50+00
2 2 2021-09-11 10:47:50+00
2 6 2021-09-15 10:47:50+00
3 2 2021-09-01 10:47:50+00
3 4 2021-09-05 10:47:50+00
3 6 2021-09-15 10:47:50+00
3 1 2021-09-16 10:47:50+00
I want to print out something like that;
id:1 downgraded their version from 5 to 1 at 2021-09-11 10:47:50+00
id:3 downgraded their version from 6 to 1 at 2021-09-16 10:47:50+00
So when I run the query the output should be:
id version downgraded_to ts
1 5 1 2021-09-11 10:47:50+00
3 6 1 2021-09-16 10:47:50+00
but I'm completely lost here.
Does it make sense to handle this situation in Postgresql? Is it possible to do it?
You may use lead analytic function to get the next version and compare it with current version assuming that the version is of a numeric type.
with next_vers as (
select t.*, lead(version) over(partition by id order by ts asc) as next_version
from(values
(1, 3, timestamp '2021-09-01 10:47:50'),
(1, 5, timestamp '2021-09-05 10:47:50'),
(1, 1, timestamp '2021-09-11 10:47:50'),
(2, 2, timestamp '2021-09-11 10:47:50'),
(2, 6, timestamp '2021-09-15 10:47:50'),
(3, 2, timestamp '2021-09-01 10:47:50'),
(3, 4, timestamp '2021-09-05 10:47:50'),
(3, 6, timestamp '2021-09-15 10:47:50'),
(3, 1, timestamp '2021-09-16 10:47:50')
) as t(id, version, ts)
)
select *
from next_vers
where version > next_version
id | version | ts | next_version
-: | ------: | :------------------ | -----------:
1 | 5 | 2021-09-05 10:47:50 | 1
3 | 6 | 2021-09-15 10:47:50 | 1
db<>fiddle here

Sum and Count by month, shown with last day of that month

I have a transaction table like this:
Trandate channelID branch amount
--------- --------- ------ ------
01/05/2019 1 2 2000
11/05/2019 1 2 2200
09/03/2020 1 2 5600
15/03/2020 1 2 600
12/10/2019 2 10 12000
12/10/2019 2 10 12000
15/11/2019 4 7 4400
15/02/2020 4 2 2500
I need to sum amount and count transactions by year and month. I tried this:
select DISTINCT
DATEPART(YEAR,a.TranDate) as [YearT],
DATEPART(MONTH,a.TranDate) as [monthT],
count(*) as [countoftran],
sum(a.Amount) as [amount],
a.Name as [branch],
a.ChannelName as [channelID]
from transactions as a
where a.TranDate>'20181231'
group by a.Name, a.ChannelName, DATEPART(YEAR,a.TranDate), DATEPART(MONTH,a.TranDate)
order by a.Name, YearT, MonthT
It works like charm. However, I will use this data on PowerBI thus I cannot show these results in a "line graphic" due to the year and month info being in separate columns.
I tried changing format on SQL to 'YYYYMM' alas powerBI doesn't recognise this column as date.
So, in the end, I need a result table looks like this:
YearT channelID branch Tamount TranT
--------- --------- ------ ------- -----
31/05/2019 1 2 4400 2
30/03/2020 1 2 7800 2
31/10/2019 2 10 24000 2
30/11/2019 4 7 4400 1
29/02/2020 4 2 2500 1
I have tried several little changes with no result.
Help is much appreciated.
You may try with the following statement:
SELECT
EOMONTH(DATEFROMPARTS(YEAR(Trandate), MONTH(Trandate), 1)) AS YearT,
branch, channelID,
SUM(amount) AS TAmount,
COUNT(*) AS TranT
FROM (VALUES
('20190501', 1, 2, 2000),
('20190511', 1, 2, 2200),
('20200309', 1, 2, 5600),
('20200315', 1, 2, 600),
('20191012', 2, 10, 12000),
('20191012', 2, 10, 12000),
('20191115', 4, 7, 4400),
('20200215', 4, 2, 2500)
) v (Trandate, channelID, branch, amount)
GROUP BY DATEFROMPARTS(YEAR(Trandate), MONTH(Trandate), 1), branch, channelID
ORDER BY DATEFROMPARTS(YEAR(Trandate), MONTH(Trandate), 1)
Result:
YearT branch channelID TAmount TranT
2019-05-31 2 1 4200 2
2019-10-31 10 2 24000 2
2019-11-30 7 4 4400 1
2020-02-29 2 4 2500 1
2020-03-31 2 1 6200 2

Running total of positive and negative numbers where the sum cannot go below zero

This is an SQL question.
I have a column of numbers which can be positive or negative, and I'm trying to figure out a way to have a running sum of the column, but where the total cannot go below zero.
Date | Number | Desired | Actual
2020-01-01 | 8 | 8 | 8
2020-01-02 | 11 | 19 | 19
2020-01-03 | 30 | 49 | 49
2020-01-04 | -10 | 39 | 39
2020-01-05 | -12 | 27 | 27
2020-01-06 | -9 | 18 | 18
2020-01-07 | -26 | 0 | -8
2020-01-08 | 5 | 5 | -3
2020-01-09 | -23 | 0 | -26
2020-01-10 | 12 | 12 | -14
2020-01-11 | 14 | 26 | 0
I have tried a number of different window functions on this, but haven't found a way to prevent the running total from going into negative numbers.
Any help would be greatly appreciated.
EDIT - Added a date column to indicate the ordering
Unfortunately, there is no way to do this without cycling through the records one-by-one. That, in turn, requires something like a recursive CTE.
with t as (
select t.*, row_number() over (order by date) as seqnum
from mytable t
),
cte as (
select NULL as number, 0 as desired, 0 as seqnum
union all
select t.number,
(case when cte.desired + t.number < 0 then 0
else cte.desired + t.number
end),
cte.seqnum + 1
from cte join
t
on t.seqnum = cte.seqnum + 1
)
select cte.*
from cte
where cte.number is not null;
I would recommend this approach only if your data is rather small. But then again, if you have to do this, there are not many alternatives other then going through the table row-by-agonizing-row.
Here is a db<>fiddle (using Postgres).
You can use a CASE operator and the SIGN function to do so…
CASE SIGN(my computed expression) WHEN -1 THEN 0 ELSE my computed expression END AS Actual
This can be done via a USER DEFINE TABLE FUNCTION to "manage" the state you want to carry
CREATE OR REPLACE FUNCTION non_neg_sum(val float) RETURNS TABLE (out_sum float)
LANGUAGE JAVASCRIPT AS
'{
processRow: function (row, rowWriter) {
this.sum += row.VAL;
if(this.sum < 0)
this.sum = 0;
rowWriter.writeRow({OUT_SUM: this.sum})
},
initialize: function() {
this.sum = 0;
}
}';
And used like so:
WITH input AS
(
SELECT *
FROM VALUES ('2020-01-01', 8, 8),
('2020-01-02', 11, 19 ),
('2020-01-03', 30, 49 ),
('2020-01-04',-10, 39 ),
('2020-01-05',-12, 27 ),
('2020-01-06', -9, 18 ),
('2020-01-07',-26, 0 ),
('2020-01-08', 5, 5 ),
('2020-01-09',-23, 0 ),
('2020-01-10', 12, 12 ),
('2020-01-11', 14, 26 ) d(day,num,wanted)
)
SELECT d.*
,sum(d.num)over(order by day) AS simple_sum
,j.*
FROM input AS d,
TABLE(non_neg_sum(d.num::float) OVER (ORDER BY d.day)) j
ORDER BY day
;
gives the results:
DAY NUM WANTED SIMPLE_SUM OUT_SUM
2020-01-01 8 8 8 8
2020-01-02 11 19 19 19
2020-01-03 30 49 49 49
2020-01-04 -10 39 39 39
2020-01-05 -12 27 27 27
2020-01-06 -9 18 18 18
2020-01-07 -26 0 -8 0
2020-01-08 5 5 -3 5
2020-01-09 -23 0 -26 0
2020-01-10 12 12 -14 12
2020-01-11 14 26 0 26
Another UDF solution:
select d, x, conditional_sum(x) from values
('2020-01-01', 8),
('2020-01-02', 11),
('2020-01-03', 30),
('2020-01-04', -10),
('2020-01-05', -12),
('2020-01-06', -9),
('2020-01-07', -26),
('2020-01-08', 5),
('2020-01-09', -23),
('2020-01-10', 12),
('2020-01-11', 14)
t(d,x)
order by d;
where conditional_sum is defined as:
create or replace function conditional_sum(X float)
returns float
language javascript
volatile
as
$$
if (!('sum' in this)) this.sum = 0
return this.sum = (X+this.sum)<0 ? 0 : this.sum+X
$$;
Demo :
WITH input AS
( SELECT *
FROM (VALUES
('2020-01-01', 8, 8),
('2020-01-02', 11, 19 ),
('2020-01-03', 30, 49 ),
('2020-01-04',-10, 39 ),
('2020-01-05',-12, 27 ),
('2020-01-06', -9, 18 ),
('2020-01-07',-26, 0 ),
('2020-01-08', 5, 5 ),
('2020-01-09',-23, 0 ),
('2020-01-10', 12, 12 ),
('2020-01-11', 14, 26 ),
('2020-01-12', 3, 26 )) AS d (day,num,wanted)
)
SELECT *, sum(num)over(order by day) AS CUM_SUM,
CASE SIGN(sum(num)over(order by day))
WHEN -1 THEN 0
ELSE sum(num)over(order by day)
END AS Actual
FROM input
ORDER BY day;
Return :
day num wanted CUM_SUM Actual
---------- ----------- ----------- ----------- -----------
2020-01-01 8 8 8 8
2020-01-02 11 19 19 19
2020-01-03 30 49 49 49
2020-01-04 -10 39 39 39
2020-01-05 -12 27 27 27
2020-01-06 -9 18 18 18
2020-01-07 -26 0 -8 0
2020-01-08 5 5 -3 0
2020-01-09 -23 0 -26 0
2020-01-10 12 12 -14 0
2020-01-11 14 26 0 0
2020-01-12 3 26 3 3
I add one more row to your test values… to demonstrate the final conditionnal sum is 3

SQL Query Comparison Processing Efficiency, Any Better Solution?

I'm working in large set of data about 134 million line i would like to make a select query with a insert in a table.
This is my table SQL script (SQL Fiddle).
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Id | Emitter | EmitterIBAN | Receiver | ReceiverIBAN | Adresss | Value
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1, Ernst, HR53 8827 2118 4692 8207 5, Kimbra, CH20 1042 6T0N MDTG JT47 U, 3256 Arrowood Point 0002, 121.72
2, Keene, SK81 1004 7484 7505 6308 9259, Torrance, RO23 ZWTR OJKK VAU9 T5P4 2GDY, 35197 Green Ridge Way, 82.52
3, Ernst, HR53 8827 2118 4692 8207 5, Kimbra, CH20 1042 6T0N MDTG JT47 U, 3256 Arrowood Point 0048, 51.81
4, Korie, ME43 9833 9830 7367 4239 60,Roy, IL69 9686 1536 8102 2219 165, 5 Swallow Alley, 88.01
5, Ernst, HR53 8827 2118 4692 8207 5, Kimbra, CH20 1042 6T0N MDTG JT47 U, 3256 Arrowood Point 0001, 133.99
6, Charmine, BG92 TOXX 8380 785I JKRQ JS, Sarette, MU67 RYRU 9293 5875 6859 7111 075X HR, 8 Sage Place, 36.30
7, Ernst, HR53 8827 2118 4692 8207 5, Kimbra, CH20 1042 6T0N MDTG JT47 U, 3256 Arrowood Point 0004, 186.99
And i select my data with this query
Select count(1) as NumberOperation,
MAX(Emitter) as EmitterName,
EmitterIban,
MAX(Receiver) as ReceiverName,
ReceiverIban,
MAX(ReceiverAddress) as ReceiverAddress,
SUM([Value]) as SumValues
FROM TableEsperadoceTransaction
Group By EmitterIban,
ReceiverIban
And i get the following result
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NumberOperation | Emitter | EmitterIBAN | Receiver | ReceiverIBAN | Adresss | SumValue
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
4, Ernst, HR53 8827 2118 4692 8207 5, Kimbra, CH20 1042 6T0N MDTG JT47 U, 3256 Arrowood Point 0002, 494,51
1, Keene, SK81 1004 7484 7505 6308 9259, Torrance, RO23 ZWTR OJKK VAU9 T5P4 2GDY, 35197 Green Ridge Way, 82.52
1, Korie, ME43 9833 9830 7367 4239 60,Roy, IL69 9686 1536 8102 2219 165, 5 Swallow Alley, 88.01
1, Charmine, BG92 TOXX 8380 785I JKRQ JS, Sarette, MU67 RYRU 9293 5875 6859 7111 075X HR, 8 Sage Place, 36.30
I also have this solution
SELECT DISTINCT *
FROM (SELECT Count(1) AS NumberOperation,
emitteriban AS _EmitterIban,
receiveriban AS _ReceiverIban,
Sum([value]) AS SumValues
FROM tableesperadocetransaction
GROUP BY emitteriban,
receiveriban) tmp_T
LEFT JOIN tableesperadocetransaction
ON tableesperadocetransaction.emitteriban = tmp_T._emitteriban
AND tableesperadocetransaction.receiveriban =
tmp_T._receiveriban
And i would like to know what's the best solution between this two and if there's query more efficient than that?
Thanks
The second query is slower because:
It has a LEFT JOIN
It has a sub-query
It has a SELECT DISTINCT
Has a * instead of column names
The first one is the most natural way of doing this.
There is a lot about how to improve performance of queries and what to avoid. See for example: MSDN on improving queries
The 1st query should be far more efficient.
If you really want to speed things up, you'll want to make sure you have a covering index with EmitterIban, ReceiverIban as the key.
You can try this.
You get MIN(id), after use it for INNER JOIN. That is also a way.
SELECT
tmp.NumberOperation
,tb.Emitter
,tmp.EmitterIban
,tb.Receiver
,tmp.ReceiverIban
,tb.Adresss
,tmp.SumValues
FROM (SELECT Count(1) AS NumberOperation,
emitteriban AS EmitterIban,
receiveriban AS ReceiverIban,
Sum([value]) AS SumValues,
MIN(Id) AS Id
FROM tableesperadocetransaction
GROUP BY emitteriban,
receiveriban) tmp
INNER JOIN tableesperadocetransaction tb
ON tableesperadocetransaction.id = tmp.Id

How to create a query on an existing table and build a table(view) with aggregated data and a restriction?

What I have is an MS-SQL database that I use to store data/info coming from equipment that is mounted in some vehicles (1-3 devices per vehicle).
For the moment, there is a table in the database named DeviceStatus - a big table used to store every information from the equipment when they connect to the TCP-server. Records are added (sql INSERT) or updated (sql UPDATE) here.
The table looks like this:
Sample data:
1040 305 3 8.00 0
1044 305 2 8.00 0
1063 305 1 8.01 1.34
1071 312 2 8.00 0
1075 312 1 8.00 1.33
1078 312 3 8.00 0
1099 414 3 8.00 0
1106 414 2 8.01 0
1113 102 1 8.01 1.34
1126 102 3 8.00 0
Remark: The driver console is always related to the device installed on first position (it's an extension of Device on Position 1; obvioulsly there's only one console per vehicle) - so, this will be some sort of restriction in order to have the correct info in the desired table(view) presented below :).
What I need is a SQL query (command/statement) to create a table(view) for a so-called "Software Versions Table", where I can see the software version for all devices installed in vehicles (all that did connect and communicate with the server)... something like the table below:
Remark: Device#1 for 414 is missing because it didn't communicate (not yet I guess...)
With the information we have so far, I think you need a query with a PIVOT:
SELECT P.VehicleNo, V.DriverConsoleVersion, P.[1] AS [Device1SwVersion], P.[2] AS [Device1SwVersion], P.[3] AS [Device1SwVersion]
FROM (
SELECT VehicleNo, [1], [2], [3]
FROM (
SELECT VehicleNo, DevicePosition, DeviceSwVersion
FROM #DeviceInfo
) as d
PIVOT (
MAX(DeviceSwVersion)
FOR DevicePosition IN ([1], [2], [3])
) PIV
) P
LEFT JOIN #DeviceInfo V
ON V.VehicleNo = P.VehicleNo AND V.DevicePosition = 1;
You can create a view with such a query.
The first subquery get 4 column for Device 1 to 3 for each vehicle.
It then LEFT JOIN it with the SwVersion table in order to get the Console version associated with Device 1.
Output:
VehicleNo DriverConsoleVersion Device1SwVersion Device1SwVersion Device1SwVersion
102 1.34 8.01 NULL 8.00
305 1.34 8.01 8.00 8.00
312 1.33 8.00 8.00 8.00
414 NULL NULL 8.01 8.00
Your data:
Declare #DeviceInfo TABLE([DeviceSerial] int, [VehicleNo] int, [DevicePosition] int, [DeviceSwVersion] varchar(10), [DriverConsoleVersion] varchar(10));
INSERT INTO #DeviceInfo([DeviceSerial], [VehicleNo], [DevicePosition], [DeviceSwVersion], [DriverConsoleVersion])
VALUES
(1040, 305, 3, '8.00', '0'),
(1044, 305, 2, '8.00', '0'),
(1063, 305, 1, '8.01', '1.34'),
(1071, 312, 2, '8.00', '0'),
(1075, 312, 1, '8.00', '1.33'),
(1078, 312, 3, '8.00', '0'),
(1099, 414, 3, '8.00', '0'),
(1106, 414, 2, '8.01', '0'),
(1113, 102, 1, '8.01', '1.34'),
(1126, 102, 3, '8.00', '0')
;
I like the PIVOT answer, but here is another way:
select VehicleNo,
max(DriverConsoleVersion) DriverConsoleVersion,
max(case when DevicePosition = 1 then DeviceSwVersion end) Device1SwVersion,
max(case when DevicePosition = 2 then DeviceSwVersion end) Device2SwVersion,
max(case when DevicePosition = 3 then DeviceSwVersion end) Device3SwVersion
from #DeviceInfo
group by VehicleNo
order by VehicleNo
You can also do casting or formatting on them. So one might be:
select ...,
isnull(cast(cast(
max(case when DevicePosition = 1 then DeviceSwVersion end)
as decimal(8,2)) / 100) as varchar(5)), '') Device1SwVersion,