group by dynamic interval with starting and ending point SQL Server

group by dynamic interval with starting and ending point SQL Server - sql

I have a table containing a column DED with numbers that can go from 0 to infinity. I am interested in grouping them starting always in 0 (upper bound as open and lower bound as closed interval) and get the percentage totals
Suppose I have a column with
DED AMT
0.0004 4
0.0009 1
0.001 2
0.002 1
0.009 4
0.01 5
0.04 6
0.09 3
0.095 1
0.9 3
1 2
100 1
500 1
so I would want the following intervals:
DED AMT PAMT
0-0.01 12 0.3529
0.01-0.02 5 0.1470
0.04-0.05 6 0.1764
0.09-0.1 4 0.1176
0.9-1 3 0.0882
1 2 0.0588
I have tried:
SELECT CAST(DED/.02*.02 AS VARCHAR) +' - '+CAST(DED/.02*.02 +.01 AS VARCHAR) AS DED,
SUM(AMT) AS AMT,ISNULL(SUM(AMT)*1.000/NULLIF(SUM(SUM(AMT)) OVER (),0),0) AS PAMT
FROM MYTABLE
WHERE DED/.02*.02<=1
GROUP BY DED/.02*.02
Thanks for your help

SELECT
ROUND(DED, 2, 1) AS DED_lower,
ROUND(DED, 2, 1) + 0.01 AS DED_upper,
SUM(AMT) AS SUM_AMT,
SUM(AMT) * 1.0
/
SUM(AMT) OVER () AS PAMT
FROM
mytable
WHERE
DED <= 1
GROUP BY
ROUND(DED, 2, 1)
ROUND(DED, 2, 1) will round Down to two decimal places. Giving equal sized bands of 0.01 in size.
Apologies for typos or formating, I'm on my phone

Related

Postgres calculate average using distinct IDs‚ values also distinct

I have a postgres query that is supposed to calculate an average value based on a set of values. This set of values should be based on DISTINCT ID's.
The query is the following:
#{context.answers_base}
SELECT
stores.name as store_name,
answers_base.question_name as question_name,
answers_base.question_id as question_id,
(sum(answers_base.answer_value) / NULLIF(count(answers_base.answer_id),0)) as score, # <--- this line is calculating wrong
sum(answers_base.answer_value) as score_sum,
count(answers_base.answer_id) as question_answer_count,
count(DISTINCT answers_base.answer_id) as answer_count
FROM answers_base
INNER JOIN stores ON stores.id = answers_base.store_id
WHERE answers_base.answer_value IS NOT NULL AND answers_base.question_type_id = :question_type_id
AND answers_base.scale = TRUE
#{context.filter_answers}
GROUP BY stores.name, answers_base.question_name, answers_base.question_id, answers_base.sort_order
ORDER BY stores.name, answers_base.sort_order
The thing is, that on the indicated line (sum(answers_base.answer_value) / NULLIF(count(answers_base.answer_id),0)) some values are counted more than once.
Part of the solution is making it DISTINCT based on ID, like so:
(sum(answers_base.answer_value) / NULLIF(count(DISTINCT answers_base.answer_id),0))
This will result in an average that divided by the right number, but here the sum it's dividing is still wrong.
Doing the following (make sum() DISTINCT) does not work, for the reason that values are not unique. The values are either 0 / 25 / 50 / 75 / 100, so different IDs might contain 'same' values.
(sum(DISTINCT answers_base.answer_value) / NULLIF(count(DISTINCT answers_base.answer_id),0))
How would I go about making this work?
Here are simplified versions of the table structures.
Table Answer
ID
answer_date
1
Feb 01, 2022
2
Mar 02, 2022
3
Mar 13, 2022
4
Mar 21, 2022
Table AnswerRow
ID
answer_id
answer_value
1
1
25
2
1
50
3
1
50
4
2
75
5
2
100
6
2
0
7
3
25
8
4
25
9
4
100
10
4
50
Answer 1' answer_rows:
25 + 50 + 50 -> average = 125 / 3
Answer 2' answer_rows:
75 + 100 + 0 -> average = 175 / 3
Answer 3' answer_rows:
25 -> average = 25 / 1
Answer 4' answer_rows:
25 + 100 + 50 -> average = 175 / 3
For some reason, we get duplicate answer_rows in the calculation.
Example of the problem; for answer_id=1 we have the following answer_rows in the calculation, giving us a different average:
ID
answer_id
answer_value
1
1
25
2
1
50
3
1
50
3
1
50
3
1
50
3
1
50
Result: 25 + 50 + 50 + 50 + 50 + 50 -> 275 / 6
Desired result: 25 + 50 + 50 -> 125 / 3
Making answer_row_id distinct (see beginning of post) makes it possible for me to get:
25 + 50 + 50 + **50 + 50 + 50** -> 275 / **3**
But not
25 + 50 + 50 -> 275 / 3
What I would like to achieve is having a calculation that selects answer_row distinctly based on its ID, and those answer_rows will be used both for calculation x and y in calculation average -> x / y.
answers_base is the following (simplified):
WITH answers_base as (
SELECT
answers.id as answer_id,
answers.store_id as store_id,
answer_rows.id as answer_row_id,
question_options.answer_value as answer_value
FROM answers
INNER JOIN answer_rows ON answers.id = answer_rows.answer_id
INNER JOIN stores ON stores.id = answers.store_id
WHERE answers.status = 0
)

I think this would be best solved with a window function. Something along the lines of
SELECT
ROW_NUMBER() OVER (PARTITION BY answer_rows.id ORDER BY answer_rows.created_at DESC) AS duplicate_answers
...
WHERE
answer_rows.duplicate_answers = 1
This would filter out multiple rows with the same id, and only keep one entry. (I chose the "first by created_at", but you could change this to whatever logic suits you best.)
A benefit to this approach is that it makes the rationale behind the logic clear, contained and re-usable.

SUM in SQL Server with PARTITION BY clause

I have the following table
QuotationId QuotationDetailId DriverId RangeFrom RangeTo FixedAmount UnitAmount
-------------------------------------------------------------------------------------------
10579 7 1 1 1 1154.00 0.00
10579 7 2 2 2 1731.00 0.00
10579 11 1 0 10 0.00 88.53
10579 11 2 11 24 885.30 100.50
10579 11 3 25 34 2292.30 88.53
I need to write a query in SQL Server with the following logic,
The grouping is QuotationId + QuotationDetailId.
For each of this block I need to sum from the second line on the value of the previous line for fixed
Amount + UnitAmount * RangeFrom + FixedAmount of the current row
So in this case the resulting output should be
QuotationId QuotationDetailId DriverId RangeFrom RangeTo FixedAmount UnitAmount
10579 7 1 1 1 1154.00 0.00
10579 7 2 2 2 2885.00 0.00
10579 11 1 0 10 0.00 88.53
10579 11 2 11 24 1770.60 100.50
10579 11 3 25 34 7174.90 88.53
I've tried several queries but without success, can someone suggest me a way to do that ?
Best regards
Fabrizio

In SQL Server 2012+, you can do a cumulative sum. I'm not sure exactly what the logic is you want, but this seems reasonable given the data set:
select t.*,
sum(FixedAmount*UnitAmount) over (partition by QuotationId, QuotationDetailId
order by DriverId
) as running_sum
from t;

you can use a subquery, your 'amount' column would appear on the list of columns as a query in brackets,
SELECT ...fields...,
(SELECT SUM(A.unitAmount * A.RangeFrom + A.fixedAmount)
From YourTable A
WHERE A.QuotationId = B.QuotationId
AND A.QuotationDetailId = B.QuotationDetailId
AND A.DriverId <= B.DriverId) AS Amount
From YourTable B

Calculating Run Cost for lengths of Pipe & Pile

I work for a small company and we're trying to get away from Excel workbooks for Inventory control. I thought I had it figured out with help from (Nasser) but its beyond me. This is what I can get into a table, from there I need too get it to look like the table below.
My data
ID|GrpID|InOut| LoadFt | LoadCostft| LoadCost | RunFt | RunCost| AvgRunCostFt
1 1 1 4549.00 0.99 4503.51 4549.00 0 0
2 1 1 1523.22 1.29 1964.9538 6072.22 0 0
3 1 2 -2491.73 0 0 3580.49 0 0
4 1 2 -96.00 0 0 3484.49 0 0
5 1 1 8471.68 1.41 11945.0688 11956.17 0 0
6 1 2 -369.00 0 0 11468.0568 0 0
7 2 1 1030.89 5.07 5223.56 1030.89 0 0
8 2 1 314.17 5.75 1806.4775 1345.06 0 0
9 2 1 239.56 6.3 1508.24 1509.228 0 0
10 2 2 -554.46 0 0 954.768 0 0
11 2 1 826.24 5.884 4861.5961 1781.008 0 0
Expected output
ID|GrpID|InOut| LoadFt | LoadCostft| LoadCost | RunFt | RunCost| AvgRunCostFt
1 1 1 4549.00 0.99 4503.51 4549.00 4503.51 0.99
2 1 1 1523.22 1.29 1964.9538 6072.22 6468.4638 1.0653
3 1 2 -2491.73 1.0653 -2490.6647 3580.49 3977.7991 1.111
4 1 2 -96.00 1.111 -106.656 3484.49 3871.1431 1.111
5 1 1 8471.68 1.41 11945.0688 11956.17 15816.2119 1.3228
6 1 2 -369.00 1.3228 -488.1132 11468.0568 15328.0987 1.3366
7 2 1 1030.89 5.07 5223.56 1030.89 5223.56 5.067
8 2 1 314.17 5.75 1806.4775 1345.06 7030.0375 5.2266
9 2 1 239.56 6.3 1508.24 1509.228 8539.2655 5.658
10 2 2 -554.46 5.658 -3137.1346 954.768 5402.1309 5.658
11 2 1 826.24 5.884 4861.5961 1781.008 10263.727 5.7629
The first record of a group would be considered the opening balance. Inventory going into the yard have the ID of 1 and out of the yard are 2's. Load footage going into the yard always has a load cost per foot and I can calculate the the running total of footage. The first record of a group is easy to calculate the run cost and run cost per foot. The next record becomes a little more difficult to calculate. I need to move the average of run cost per foot forward to the load cost per foot when something is going out of the yard and then calculate the run cost and average run cost per foot again. Hopefully this makes sense to somebody and we can automate some of these calculations. Thanks for any help.
Here's an Oracle example I found;
SQL> select order_id
2 , volume
3 , price
4 , total_vol
5 , total_costs
6 , unit_costs
7 from ( select order_id
8 , volume
9 , price
10 , volume total_vol
11 , 0.0 total_costs
12 , 0.0 unit_costs
13 , row_number() over (order by order_id) rn
14 from costs
15 order by order_id
16 )
17 model
18 dimension by (order_id)
19 measures (volume, price, total_vol, total_costs, unit_costs)
20 rules iterate (4)
21 ( total_vol[any] = volume[cv()] + nvl(total_vol[cv()-1],0.0)
22 , total_costs[any]
23 = case SIGN(volume[cv()])
24 when -1 then total_vol[cv()] * nvl(unit_costs[cv()-1],0.0)
25 else volume[cv()] * price[cv()] + nvl(total_costs[cv()-1],0.0)
26 end
27 , unit_costs[any] = total_costs[cv()] / total_vol[cv()]
28 )
29 order by order_id
30 /
ORDER_ID VOLUME PRICE TOTAL_VOL TOTAL_COSTS UNIT_COSTS
---------- ---------- ---------- ---------- ----------- ----------
1 1000 100 1000 100000 100
2 -500 110 500 50000 100
3 1500 80 2000 170000 85
4 -100 150 1900 161500 85
5 -600 110 1300 110500 85
6 700 105 2000 184000 92
6 rows selected.

Let me say first off three things:
This is certainly not the best way to do it. There is a rule saying that if you need a while-loop, then you are most probably doing something wrong.
I suspect there is some calculation errors in your original "Expected output", please check the calculations since my calculated values are different according to your formulas.
This question could also be seen as a gimme teh codez type of question, but since you asked a decently formed question with some follow-up research, my answer is below. (So no upvoting since this is help for a specific case)
Now onto the solution:
I attempted to use my initial hint of the LAG statement in a nicely formed single update statement, but since you can only use a windowed function (aka LAG) inside a select or order by clause, that will not work.
What the code below does in short:
It calculates the various calculated fields for each record when they can be calculated and with the appropriate functions, updates the table and then moves onto the next record.
Please see comments in the code for additional information.
TempTable is a demo table (visible in the linked SQLFiddle).
Please read this answer for information about decimal(19, 4)
-- Our state and running variables
DECLARE #curId INT = 0,
#curGrpId INT,
#prevId INT = 0,
#prevGrpId INT = 0,
#LoadCostFt DECIMAL(19, 4),
#RunFt DECIMAL(19, 4),
#RunCost DECIMAL(19, 4)
WHILE EXISTS (SELECT 1
FROM TempTable
WHERE DoneFlag = 0) -- DoneFlag is a bit column I added to the table for calculation purposes, could also be called "IsCalced"
BEGIN
SELECT top 1 -- top 1 here to get the next row based on the ID column
#prevId = #curId,
#curId = tmp.ID,
#curGrpId = Grpid
FROM TempTable tmp
WHERE tmp.DoneFlag = 0
ORDER BY tmp.GrpID, tmp.ID -- order by to ensure that we get everything from one GrpID first
-- Calculate the LoadCostFt.
-- It is either predetermined (if InOut = 1) or derived from the previous record's AvgRunCostFt (if InOut = 2)
SELECT #LoadCostFt = CASE
WHEN tmp.INOUT = 2
THEN (lag(tmp.AvgRunCostFt, 1, 0.0) OVER (partition BY GrpId ORDER BY ID))
ELSE tmp.LoadCostFt
END
FROM TempTable tmp
WHERE tmp.ID IN (#curId, #prevId)
AND tmp.GrpID = #curGrpId
-- Calculate the LoadCost
UPDATE TempTable
SET LoadCost = LoadFt * #LoadCostFt
WHERE Id = #curId
-- Calculate the current RunFt and RunCost based on the current LoadFt and LoadCost plus the previous row's RunFt and RunCost
SELECT #RunFt = (LoadFt + (lag(RunFt, 1, 0) OVER (partition BY GrpId ORDER BY ID))),
#RunCost = (LoadCost + (lag(RunCost, 1, 0) OVER (partition BY GrpId ORDER BY ID)))
FROM TempTable tmp
WHERE tmp.ID IN (#curId, #prevId)
AND tmp.GrpID = #curGrpId
-- Set all our values, including the AvgRunCostFt calc
UPDATE TempTable
SET RunFt = #RunFt,
RunCost = #RunCost,
LoadCostFt = #LoadCostFt,
AvgRunCostFt = #RunCost / #RunFt,
doneflag = 1
WHERE ID = #curId
END
SELECT ID, GrpID, InOut, LoadFt, RunFt, LoadCost,
RunCost, LoadCostFt, AvgRunCostFt
FROM TempTable
ORDER BY GrpID, Id
The output with your sample data and a SQLFiddle demonstrating how it all works:
ID GrpID InOut LoadFt RunFt LoadCost RunCost LoadCostFt AvgRunCostFt
1 1 1 4549 4549 4503.51 4503.51 0.99 0.99
2 1 1 1523.22 6072.22 1964.9538 6468.4638 1.29 1.0653
3 1 2 -2491.73 3580.49 -2654.44 3814.0238 1.0653 1.0652
4 1 2 -96 3484.49 -102.2592 3711.7646 1.0652 1.0652
5 1 1 8471.68 11956.17 11945.0688 15656.8334 1.41 1.3095
6 1 2 -369 11587.17 -483.2055 15173.6279 1.3095 1.3095
7 2 1 1030.89 1030.89 5226.6123 5226.6123 5.07 5.07
8 2 1 314.17 1345.06 1806.4775 7033.0898 5.75 5.2288
9 2 1 239.56 1584.62 1509.228 8542.3178 6.3 5.3908
10 2 2 -554.46 1030.16 -2988.983 5553.3348 5.3908 5.3907
11 2 1 826.24 1856.4 4861.5962 10414.931 5.884 5.6103
If you are unclear about parts of the code, I can update with additional explanations.

Sum operation performed on rows till specified value: a new row for each group for which the sum exceeds the specified value

CREATE TABLE TEMP(RESOURCE_VALUE VARCHAR2(63 BYTE),TOT_COUNT NUMBER)
I want an query which can extract the range from which to which I want to have breakup of the sum records to XYZ value. I will say 50,000 is the break up need. Then it has to display all the ranges from which RESOURCE_VALUE to which RESOURCE_VALUE I can get sum <=50,000. One RESOURCE_VALUE value can be included in only one range.
Example: sample data
The Below Is The input
resource_value | tot_count
---------------+----------
1 100
2 50
3 20
4 30
5 300
6 250
7 200
8 30
9 60
10 200
11 110
12 120
Then the output has to be something like this :
sample output 1: when sum(tot_count)<=300
start resource_value endresource_value sum
---------------------+---------------------+-------
1 4 300
5 5 300
6 6 250
7 9 290
10 10 200
11 12 230
sample output 2: when sum(tot_count)<=500
start resource_value end resource_value sum
---------------------+---------------------+------
1 4 300
5 5 300
6 8 480
9 12 490

I just guess that you use ORACLE, because of your table structure, and in oracle you can use this query to get your aim:
with vw1(val,flg,sumval) as
(select 1 val,0 flg,TOT_COUNT sumval
from TEMP where RESOURCE_VALUE = '1'
union all
select vw1.val + 1 val,
case when vw1.sumval + t1.TOT_COUNT > 300 then vw1.flg + 1 else vw1.flg end flg,
case when vw1.sumval + t1.TOT_COUNT > 300 then t1.TOT_COUNT else vw1.sumval + t1.TOT_COUNT end sumval
From TEMP t1,vw1 WHERE t1.RESOURCE_VALUE = TO_CHAR(vw1.val + 1))
select min(val) START_RESOURCE_VALUE,max(val) END_RESOURCE_VALUE,
max(sumval) "SUM" from vw1 group by flg order by min(val);
SQL Fiddle

Manipulating SQL query to create data categories that return maximum values

I have a table that looks like this at the moment:
Day Limit Price
1 52 0.3
1 4 70
1 44 200
1 9 0.01
1 0 0.03
1 0 0.03
2 52 0.4
2 10 70
2 44 200
2 5 0.01
2 0 0.55
2 2 50
Is there a way I can use SQL to manipulate the result into a table with different categories for price and selecting the maximum value for the limit respective to its price?
Day 0-10 10-100 100+
1 52 4 44
2 52 10 44

You can use CASE and MAX:
SELECT Day,
MAX(CASE WHEN Price BETWEEN 0 AND 10 THEN Limit ELSE 0 END) as ZeroToTen,
MAX(CASE WHEN Price BETWEEN 10 AND 100 THEN Limit ELSE 0 END) as TenToHundred,
MAX(CASE WHEN Price > 100 THEN Limit ELSE 0 END) as HundredPlus
FROM YourTable
GROUP BY Day
Here is the Fiddle.
BTW -- if you're using MySQL, add ticks around LIMIT since it's a keyword.
Good luck.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

group by dynamic interval with starting and ending point SQL Server - sql

Related

Postgres calculate average using distinct IDs‚ values also distinct

SUM in SQL Server with PARTITION BY clause

Calculating Run Cost for lengths of Pipe & Pile

Sum operation performed on rows till specified value: a new row for each group for which the sum exceeds the specified value

Manipulating SQL query to create data categories that return maximum values

Categories

Resources