How to find the average of multiple columns with different specific weights? - sql

I need to calculate the average of Value_A, Value_B, and Value_C in mssql.
My problem is that every information I need is in one row.
Every value has its own weight:
(sum of values * weight) / (sum weight):
Every column can be null. If there is a value but not a weight, weight is 100,
if there is a weight and no value then the specific value is not considered of course
e.g.
1st column:
(2*100+1*80)/(100+80)= 2.55 ≈ 2.6
2nd column:
(1*100+2*80)/(100+80)
+------+---------+---------+---------+----------+----------+----------+-----+
| ID | VALUE_A | VALUE_B | VALUE_C | Weight_A | Weight_B | Weight_C | AVG |
+------+---------+---------+---------+----------+----------+----------+-----+
| 1111 | 2 | 1 | null | 100 | 80 | 60 | 2.6 |
+------+---------+---------+---------+----------+----------+----------+-----+
| 2222 | 1 | 2 | null | 100 | 80 | 60 | |
+------+---------+---------+---------+----------+----------+----------+-----+
I got this far to get the AVG values without weights
select ID, VALUE_A, VALUE_B, VALUE_C, Weight_A, Weight_B, Weight_C,
(SELECT AVG(Cast(c as decimal(18,1)))
FROM (VALUES(VALUE_A),
(VALUE_B),
(VALUE_C)) T (c)) AS [Average]
FROM table
Second try was selecting sum of values multiply them by their weights and then divide them by sum of the weights. Sum of weights is missing. Can't figure out how to add it
select *,
(SELECT SUM(Cast(c as decimal(18,1)))
FROM (VALUES(VALUE_A* ISNULL(Weight_A,100)),
(VALUE_B* ISNULL(Weight_B,100)),
(VALUE_C* ISNULL(Weight_C,100))
) T (c)) AS [Average]
FROM table

Is this what you are looking for?
SELECT SUM(val * COALESCE(w, 100)) / SUM(w) as weighted_average,
SUM(val * COALESCE(w, 100)) as weighted_sum
FROM table t CROSS APPLY
(VALUES (t.VALUE_A, t.Weight_A),
(t.VALUE_B, t.Weight_B),
(t.VALUE_C, t.Weight_C)
) a(val, w)
WHERE a.val IS NOT NULL;

This is how Average could be calculated
SELECT *
,CASE
WHEN (W.weight_A + W.Weight_B+ W.Weight_C) = 0
THEN 0
ELSE (ISNULL(VALUE_A, 0 * W.Weight_A)
+ (ISNULL(VALUE_B, 0) * W.Weight_B)
+ (ISNULL(VALUE_C, 0) * W.Weight_C))
/ (W.weight_A + w.Weight_B+ W.Weight_C)
END Average
FROM TABLE t
CROSS APPLY (Select CASE WHEN VALUE_A is null then 0 ELSE ISNULL(Weight_A,100) END [Weight_A]
,CASE WHEN VALUE_B is null then 0 ELSE ISNULL(Weight_B,100) END [Weight_B]
,CASE WHEN VALUE_C is null then 0 ELSE ISNULL(Weight_C,100) END [Weight_C]) W

Related

Recursive loop in BigQuery for capped cumulative sum?

I'd like to be able to implement a "capped" cumulative sum in BigQuery using SQL.
Here's what I mean: I have a table whose rows have the amount by which a value is increased/decreased each day, but the value cannot go below 0 or above 100. I want to compute the cumulative sum of the changes to keep track of this value.
As an example, consider the following table:
day | change
--------------
1 | 70
2 | 50
3 | 20
4 | -30
5 | 10
6 | -90
7 | 20
I want to make a column that has the capped cumulative sum so that it looks like this:
day | change | capped cumsum
----------------------------
1 | 70 | 70
2 | 50 | 100
3 | 20 | 100
4 | -30 | 70
5 | 10 | 80
6 | -90 | 0
7 | 20 | 20
Simply doing SUM (change) OVER (ORDER BY day) and capping the values at 100 and 0 won't work. I need some sort of recursive loop and I don't know how to implement this in BigQuery.
Eventually I'd also like to do this over partitions, so that if I have something like
day | class | change
--------------
1 | A | 70
1 | B | 12
2 | A | 50
2 | B | 83
3 | A | -30
3 | B | 17
4 | A | 10
5 | A | -90
6 | A | 20
I can do the capped cumulative sum partitioned over each class.
I need some sort of recursive loop and I don't know how to implement this in BigQuery
Super naïve / cursor based approach
declare cumulative_change int64 default 0;
create temp table temp_table as (
select * , 0 as capped_cumsum from your_table where false
);
for rec in (select * from your_table order by day)
do
set cumulative_change = cumulative_change + rec.change;
set cumulative_change = case when cumulative_change < 0 then 0 when cumulative_change > 100 then 100 else cumulative_change end;
insert into temp_table (select rec.*, cumulative_change);
end for;
select * from temp_table order by day;
if applied to sample data in your question - output is
Slightly modified option with use of array instead of temp table
declare cumulative_change int64 default 0;
declare result array<struct<day int64, change int64, capped_cumsum int64>>;
for rec in (select * from your_table order by day)
do
set cumulative_change = cumulative_change + rec.change;
set cumulative_change = case when cumulative_change < 0 then 0 when cumulative_change > 100 then 100 else cumulative_change end;
set result = array(select as struct * from unnest(result) union all select as struct rec.*, cumulative_change);
end for;
select * from unnest(result) order by day;
P.S. I like none of above options so far :o)
Meantime, that approach might work for relatively small tables, set of data
Using RECURSIVE CTE can be another option:
DECLARE sample ARRAY<STRUCT<day INT64, change INT64>> DEFAULT [
(1, 70), (2, 50), (3, 20), (4, -30), (5, 10), (6, -90), (7, 20)
];
WITH RECURSIVE ccsum AS (
SELECT 0 AS n, vals[OFFSET(0)] AS change,
CASE
WHEN vals[OFFSET(0)] > 100 THEN 100
WHEN vals[OFFSET(0)] < 0 THEN 0
ELSE vals[OFFSET(0)]
END AS cap_csum
FROM sample
UNION ALL
SELECT n + 1 AS n, vals[OFFSET(n + 1)] AS change,
CASE
WHEN cap_csum + vals[OFFSET(n + 1)] > 100 THEN 100
WHEN cap_csum + vals[OFFSET(n + 1)] < 0 THEN 0
ELSE cap_csum + vals[OFFSET(n + 1)]
END AS cap_csum
FROM ccsum, sample
WHERE n < ARRAY_LENGTH(vals) - 1
),
sample AS (
SELECT ARRAY_AGG(change ORDER BY day) vals FROM UNNEST(sample)
)
SELECT * EXCEPT(n) FROM ccsum ORDER BY n;
output:
Eventually I'd also like to do this over partitions ...
Consider below solution
create temp function cap_value(value int64, lower_boundary int64, upper_boundary int64) as (
least(greatest(value, lower_boundary), upper_boundary)
);
with recursive temp_table as (
select *, row_number() over(partition by class order by day) as n from your_table
), iterations as (
select 1 as n, day, class, change, cap_value(change, 0, 100) as capped_cumsum
from temp_table
where n = 1
union all
select t.n, t.day, t.class, t.change, cap_value(i.capped_cumsum + t.change, 0, 100) as capped_cumsum
from temp_table t
join iterations i
on t.n = i.n + 1
and t.class = i.class
)
select * except(n) from iterations
order by class, day
if applied to sample data in your question - output is

SQL Sort population by value and place in groups by value

I have to create a report. I’m having trouble figuring how to approach it. On top of that, I don’t have the proper vocabulary to express it, and thusly search for the solution. Please bear with me.
I have a population of accounts. The accounts must be ordered by value. The accounts at bottom 5% of the overall value are placed in a group (Group #5). The remaining 95% of the population are divided into four equal groups (Groups #1-4) by value (not by number of accounts).
The values of the accounts change over time so the results would change over time. I'm hoping to produce an output something like this...
ACC# |VALUE|GROUP|
------+-----+-----+
2615A | 24 | 1
0793A | 24 | 2
0652A | 12 | 3
6758A | 12 | 3
7764A | 6 | 4
8718A | 6 | 4
0155A | 6 | 4
6923A | 5 | 4
8079A | 3 | 5
2265A | 1 | 5
7421A | 1 | 5
I have the option of running it in SQL Server or Oracle(11g). Whichever gets me over the finish line.
Thanks in advance.
I would use row_number() and count() window functions:
select t.*,
(case when seqnum <= (cnt * 0.95 * 0.25) then 1
when seqnum <= (cnt * 0.95 * 0.50) then 2
when seqnum <= (cnt * 0.95 * 0.75) then 3
when seqnum <= (cnt * 0.95 * 1.00) then 4
else 5
end) as grp
from (select t.*,
row_number() over (order by value desc, acc) as seqnum,
count(*) over () as cnt
from t
) t;
Note: rows with the same value can be in different groups -- as in your example data. If you don't want this to be the case, then use rank() instead of row_number().
EDIT:
If you want equal value, just use cumulative sums and totals:
select t.*,
(case when running_value <= (total_value * 0.95 * 0.25) then 1
when running_value <= (total_value * 0.95 * 0.50) then 2
when running_value <= (total_value * 0.95 * 0.75) then 3
when running_value <= (total_value * 0.95 * 1.00) then 4
else 5
end) as grp
from (select t.*,
sum(value) over (order by value desc, acc) as running_value,
sum(value) over () as total_value
from t
) t;
Using a few SUM OVER's seems to get those results somehow.
CREATE TABLE test
(
ID INT IDENTITY(1,1) PRIMARY KEY,
ACC# VARCHAR(5),
[VALUE] INT
);
INSERT INTO test
(ACC#, [VALUE]) VALUES
('2615A', 24),
('0793A', 24),
('0652A', 12),
('6758A', 12),
('7764A', 6),
('8718A', 6),
('0155A', 6),
('6923A', 5),
('8079A', 3),
('2265A', 1),
('7421A', 1);
>
WITH CTE_DATA AS
(
SELECT *,
CASE
WHEN (1.0*SUM([VALUE]) OVER (ORDER BY [VALUE], ID DESC)
/ SUM([VALUE]) OVER ()) <= 0.05
THEN 5
END AS grp
FROM test
)
SELECT ID, ACC#, [VALUE],
COALESCE(grp
, CEILING(FLOOR(
100.0*SUM([VALUE]) OVER (PARTITION BY grp ORDER BY [VALUE] DESC, ID)
/ SUM([VALUE]) OVER (PARTITION BY grp)
)/25)
) AS [GROUP]
FROM CTE_DATA
ORDER BY ID;
ID | ACC# | VALUE | GROUP
-: | :---- | ----: | :----
1 | 2615A | 24 | 1
2 | 0793A | 24 | 2
3 | 0652A | 12 | 3
4 | 6758A | 12 | 3
5 | 7764A | 6 | 4
6 | 8718A | 6 | 4
7 | 0155A | 6 | 4
8 | 6923A | 5 | 4
9 | 8079A | 3 | 5
10 | 2265A | 1 | 5
11 | 7421A | 1 | 5
db<>fiddle here

query for column that are within a variable + or 1 of another column

I have a table that has 2 columns, and I am trying to determine a way to select the records where the two columns are CLOSE to one another. Maybe based on standard deviation if i can think about how to do that. But for now, this is what my table looks like:
ID| PCT | RETURN
1 | 20 | 1.20
2 | 15 | 0.90
3 | 0 | 3.00
The values in the pct field is a percent number (for example 20%). The value in the return field is a not fully calculated % number (so its supposed to be 20% above what the initial value was). The query I am working with so far is this:
select * from TABLE1 where ((pct = ((return - 1)* 100)));
What I'd like to end up with are the rows where both are within a set value of each other. For example If they are within 5 points of each other, then the row would be returned and the output would be:
ID| PCT | RETURN
1 | 20 | 1.20
2 | 15 | 0.90
In the above, ID 1 should work out to be PCT = 20 and Return = 20, and ID 2, is PCT = 15 and RETURN = 10. Because it was within 5 points of each other, it was returned.
ID 3 was not returned because 0 and 200 are way above the 5 point threshold.
Is there any way to set a variable that would return a +- 5 when comparing the two values from the above attributes? Thanks.
RexTester Example:
Use Lead() over (Order by PCT) to look ahead and LAG() to look back to the next row do the math and evaluate results...
WITH CTE (ID, PCT , RETURN) as (
SELECT 1 , 20 , 1.20 FROM DUAL UNION ALL
SELECT 2 , 15 , 0.90 FROM DUAL UNION ALL
SELECT 3 , 0 , 3.00 FROM DUAL),
CTE2 as (SELECT A.*, LEAD(PCT) Over (ORDER BY PCT) LEADPCT, LAG(PCT) Over (order by PCT) LAGPCT
FROM CTE A)
SELECT * FROM CTE2
WHERE LEADPCT-PCT <=5 OR PCT-LAGPCT <=5
Order by ID
Giving us:
+----+----+-----+--------+---------+--------+
| | ID | PCT | RETURN | LEADPCT | LAGPCT |
+----+----+-----+--------+---------+--------+
| 1 | 1 | 20 | 1,20 | NULL | 15 |
| 2 | 2 | 15 | 0,90 | 20 | 0 |
+----+----+-----+--------+---------+--------+
or use the return value instead of PCT... just depends on what you're after. But maybe I don't fully understand the question..

SQL Server: select count of rows with not empty fields and total count of rows

Table has 4 int columns (Price0, Price1, Price2, Price3).
Example of table:
ID | Price0 | Price1 | Price2 | Price3 |
---+--------+--------+--------+--------+
1 | 10 | 20 | NULL | NULL |
2 | 70 | NULL | NULL | NULL |
3 | 30 | 40 | 50 | NULL |
How to query this table to get
total count of rows
and count of rows where count of filled Price columns >= N (for example N = 2)
Result must be:
Total | Filled
------+-------
3 | 2
This query show how many Price fileds is filled in each row
select
(select count(*) as filledFieldsCount
from (values (T.Price0), (T.Price1), (T.Price2), (T.Price3)) as v(col)
where v.col is not null
)
from Table1 T
Wouldn't with only 4 columns a simple nested case when be straightforward
select count(*),
sum(case when (
CASE WHEN Price1 is null THEN 0 ELSE 1 END +
CASE WHEN Price2 is null THEN 0 ELSE 1 END +
CASE WHEN Price3 is null THEN 0 ELSE 1 END +
CASE WHEN Price4 is null THEN 0 ELSE 1 END) >= 2 then 1 else 0 end)
FROM Table1
You can do this with conditional aggregation:
select count(*),
sum(case when tt.filledFieldsCount >= 2 then 1 else 0 end)
from Table1 T outer apply
(select count(*) as filledFieldsCount
from (values (T.Price0), (T.Price1), (T.Price2), (T.Price3)) as v(col)
where v.col is not null
) tt;
I moved the subquery to the from clause using apply. This is an example of a lateral join. In this case, it does the same thing as the subquery.

SQL Query to find the product of values in same column

Interview Question asked before.
Given a table with columns boxName and value,find the volume of each box.
Value field has the lenght,bredth and height of a cube. I am required to multiply all the tree dimensions.
I can use a groupby if the sum need to be calculated, but here product is required
box1 = 12*13*1 = 156
box2 = 1*23*6 = 138
box3 = 12*1*20 = 240
+---------+-------+
| boxName | value |
+---------+-------+
| box1 | 12 |
| box1 | 13 |
| box1 | 3 |
| box2 | 1 |
| box2 | 23 |
| box2 | 6 |
| box3 | 12 |
| box3 | 1 |
| box4 | 30 |
+---------+-------+
Try this
select EXP(SUM(LN(value))) As Product_val ,boxName
from yourTable
Group by boxName
Note : value <= 0 LOG will fail.
When you have value <= 0 then use this.
SELECT
boxName,
CASE
WHEN MinVal = 0 THEN 0
WHEN Neg % 2 = 1 THEN -1 * EXP(ABSMult)
ELSE EXP(ABSMult)
END
FROM
(
SELECT
boxName,
--log of +ve row values
SUM(LN(ABS(NULLIF(Value, 0)))) AS ABSMult,
--count of -ve values. Even = +ve result.
SUM(SIGN(CASE WHEN Value < 0 THEN 1 ELSE 0 END)) AS Neg,
--anything * zero = zero
MIN(ABS(Value)) AS MinVal
FROM
yourTable
GROUP BY
boxName
) foo
Refered from this answer
If you know for a fact that each box will have exactly 3 rows for the 3 dimensions, you can use the row_number() analytic function to uniquely identify the 3 dimensions, and then use max(case ...) to extract the 3 dimensions and multiply them:
select boxName,
max(case when rn = 1 then value end) *
max(case when rn = 2 then value end) *
max(case when rn = 3 then value end) as volume
from (select t.*,
row_number() over (partition by t.boxName order by null) as rn
from yourTable t)
group by boxName