Product of distinct values before a certain date - sql

I have a table with schema:
date | item_id | factor
----------------------
20180710 | 1 | 0.1
20180711 | 1 | 0.1
20180712 | 1 | 2
20180713 | 1 | 2
20180714 | 1 | 2
20180710 | 2 | 0.1
20180711 | 2 | 0.1
20180712 | 2 | 5
20180713 | 2 | 5
20180714 | 2 | 10
The factor for each item_id can change on any date. On each date, I need to calculate the product of all the distinct factors for each item_id up to that date (inclusive), so the final output for the above table should be:
date | id | cumulative_factor
20180710 | 1 | 0.1
20180711 | 1 | 0.1
20180712 | 1 | 0.2
20180713 | 1 | 0.2
20180714 | 1 | 0.2
20180710 | 2 | 0.1
20180711 | 2 | 0.1
20180712 | 2 | 0.5
20180713 | 2 | 0.5
20180714 | 2 | 5
Logic:
On 20180711, for id=1, the distinct factors is 0.1 only, so the cumulative factor is 0.1.
On 20180714, for id=1, the distinct factors are 0.1 and 2, so the cumulative factor is 0.1*2 = 0.2.
On 20180714, for id=2, the distinct factors are 0.1, 5 and 10, so the cumulative factor is 0.1*5*10 = 5.
I've tried
select a.id, a.date, b.cum_factor
from factor_table a
left join (
select id, date, ISNULL(EXP(SUM(distinct log_factor)),1) as cum_factor
from factor_table
where date < a.date
) b
on a.id=b.id and a.date = b.date
but I get the error
a.date not found

there isn't a product aggregate function in SQL Server.
However, you can emulate it with EXP ( SUM ( LAG ( value ) ) )
please refer to in-line query for the comments
; with
cte as
(
-- this cte set the factor to 1 if it is same as previous row
-- as you wanted `product of distinct`
select *,
factor2 = CASE WHEN LAG(factor) OVER (PARTITION BY id
ORDER BY [date]) IS NULL
OR LAG(factor) OVER (PARTITION BY id
ORDER BY [date]) <> factor
THEN factor
ELSE 1
END
from factor_table
),
cte2 as
(
-- this cte peform SUM( LOG ( factor ) ) only. except EXP()
select *, factor3 = SUM(LOG(factor2)) OVER (PARTITION BY id
ORDER BY [date])
from cte
)
-- EXP() is not a window function, so it has to do it in separately in another level
select *, EXP(factor3) as cumulative_factor
from cte2
Note : LAG() required SQL Server 2012 or later

Something seems wrong with multiplying distinct factors. You can pretty easily express this using window functions:
select f.id, f.date, f.cum_factor
exp(sum(distinct log(log_factor) over (partition by f.id order by f.date)))
from factor_table f;
To get around the limitation on distinct:
select f.id, f.date, f.cum_factor
exp(sum(log(case when seqnum = 1 then log_factor end) over (partition by f.id order by f.date)))
from (select t.*,
row_number() over (partition by id, log_factor order by date) as seqnum
from factor_table f
) f;

Related

Group by and fetch column that is not in group by clause

I have (sample) data:
equipment_id | node_id | value (type: jsonb)
------------------------------
1 | 1 | 0.3
1 | 2 | 0.4
2 | 3 | 0.7
2 | 4 | 0.6
2 | 5 | 0.7
And I want to get the rows that has max value within the same equipment_id:
equipment_id | node_id | value
------------------------------
1 | 2 | 0.4
2 | 3 | 0.7
2 | 5 | 0.7
There is query that does what I want but I'm afraid of performance degradation because of casting jsonb to float:
with cte as (
select
equipment_id,
max(value::text::float) as val
from metrics
group by equipment_id
)
select cte.equipment_id, m.node_id, cte.val
from cte
join metrics m on cte.equipment_id = m.equipment_id and cte.val = m.value::text::float
How can I avoid casting?
Use distinct on:
select distinct on (equipement_id) m.*
from metrics m
order by equipment_id, value desc;
If your value is actually stored as a string, then use:
order by equipment_id, value::numeric desc;
You can use row_number()
select * from
(
select *, row_number() over(partition by equipment_id order by value::text::float desc) as rn
from tablename
)A where rn=1

SQL Sort population by value and place in groups by value

I have to create a report. I’m having trouble figuring how to approach it. On top of that, I don’t have the proper vocabulary to express it, and thusly search for the solution. Please bear with me.
I have a population of accounts. The accounts must be ordered by value. The accounts at bottom 5% of the overall value are placed in a group (Group #5). The remaining 95% of the population are divided into four equal groups (Groups #1-4) by value (not by number of accounts).
The values of the accounts change over time so the results would change over time. I'm hoping to produce an output something like this...
ACC# |VALUE|GROUP|
------+-----+-----+
2615A | 24 | 1
0793A | 24 | 2
0652A | 12 | 3
6758A | 12 | 3
7764A | 6 | 4
8718A | 6 | 4
0155A | 6 | 4
6923A | 5 | 4
8079A | 3 | 5
2265A | 1 | 5
7421A | 1 | 5
I have the option of running it in SQL Server or Oracle(11g). Whichever gets me over the finish line.
Thanks in advance.
I would use row_number() and count() window functions:
select t.*,
(case when seqnum <= (cnt * 0.95 * 0.25) then 1
when seqnum <= (cnt * 0.95 * 0.50) then 2
when seqnum <= (cnt * 0.95 * 0.75) then 3
when seqnum <= (cnt * 0.95 * 1.00) then 4
else 5
end) as grp
from (select t.*,
row_number() over (order by value desc, acc) as seqnum,
count(*) over () as cnt
from t
) t;
Note: rows with the same value can be in different groups -- as in your example data. If you don't want this to be the case, then use rank() instead of row_number().
EDIT:
If you want equal value, just use cumulative sums and totals:
select t.*,
(case when running_value <= (total_value * 0.95 * 0.25) then 1
when running_value <= (total_value * 0.95 * 0.50) then 2
when running_value <= (total_value * 0.95 * 0.75) then 3
when running_value <= (total_value * 0.95 * 1.00) then 4
else 5
end) as grp
from (select t.*,
sum(value) over (order by value desc, acc) as running_value,
sum(value) over () as total_value
from t
) t;
Using a few SUM OVER's seems to get those results somehow.
CREATE TABLE test
(
ID INT IDENTITY(1,1) PRIMARY KEY,
ACC# VARCHAR(5),
[VALUE] INT
);
INSERT INTO test
(ACC#, [VALUE]) VALUES
('2615A', 24),
('0793A', 24),
('0652A', 12),
('6758A', 12),
('7764A', 6),
('8718A', 6),
('0155A', 6),
('6923A', 5),
('8079A', 3),
('2265A', 1),
('7421A', 1);
>
WITH CTE_DATA AS
(
SELECT *,
CASE
WHEN (1.0*SUM([VALUE]) OVER (ORDER BY [VALUE], ID DESC)
/ SUM([VALUE]) OVER ()) <= 0.05
THEN 5
END AS grp
FROM test
)
SELECT ID, ACC#, [VALUE],
COALESCE(grp
, CEILING(FLOOR(
100.0*SUM([VALUE]) OVER (PARTITION BY grp ORDER BY [VALUE] DESC, ID)
/ SUM([VALUE]) OVER (PARTITION BY grp)
)/25)
) AS [GROUP]
FROM CTE_DATA
ORDER BY ID;
ID | ACC# | VALUE | GROUP
-: | :---- | ----: | :----
1 | 2615A | 24 | 1
2 | 0793A | 24 | 2
3 | 0652A | 12 | 3
4 | 6758A | 12 | 3
5 | 7764A | 6 | 4
6 | 8718A | 6 | 4
7 | 0155A | 6 | 4
8 | 6923A | 5 | 4
9 | 8079A | 3 | 5
10 | 2265A | 1 | 5
11 | 7421A | 1 | 5
db<>fiddle here

Grouping SQL Results based on order

I have table with data something like this:
ID | RowNumber | Data
------------------------------
1 | 1 | Data
2 | 2 | Data
3 | 3 | Data
4 | 1 | Data
5 | 2 | Data
6 | 1 | Data
7 | 2 | Data
8 | 3 | Data
9 | 4 | Data
I want to group each set of RowNumbers So that my result is something like this:
ID | RowNumber | Group | Data
--------------------------------------
1 | 1 | a | Data
2 | 2 | a | Data
3 | 3 | a | Data
4 | 1 | b | Data
5 | 2 | b | Data
6 | 1 | c | Data
7 | 2 | c | Data
8 | 3 | c | Data
9 | 4 | c | Data
The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.
Additional Info
ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.
For the clarified requirements in the comments
The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they
could be "1,2,4,6" ... a higher number followed by a lower would be a
new group.
A SQL Server 2012 solution could be as follows.
Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.
Code
WITH T1 AS
(
SELECT *,
LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
RowNumber,
Data,
SUM(NewGroup) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM T2
SQL Fiddle
Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.
If the ids are truly sequential, you can do:
select t.*,
(id - rowNumber) as grp
from t
Also you can use recursive CTE
;WITH cte AS
(
SELECT ID, RowNumber, Data, 1 AS [Group]
FROM dbo.test1
WHERE ID = 1
UNION ALL
SELECT t.ID, t.RowNumber, t.Data,
CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
)
SELECT *
FROM cte
Demo on SQLFiddle
How about:
select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
from [Your Table] t
) t
order by ID
This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.

SQL Server - Sum entire column AND Group By

Suppose I had the following table in SQL Server:
grp: val: criteria:
a 1 1
a 1 1
b 1 1
b 1 1
b 1 1
c 1 1
c 1 1
c 1 1
d 1 1
Now what I want is to get an output which would basically be:
Select grp, val / [sum(val) for all records] grouped by grp where criteria = 1
So, given the following is true:
Sum of all values = 9
Sum of values in grp(a) = 2
Sum of values in grp(b) = 3
Sum of values in grp(c) = 3
Sum of values in grp(d) = 1
The output would be as follows:
grp: calc:
a 2/9
b 3/9
c 3/9
d 1/9
What would my SQL have to look like??
Thanks!!
You should be able to use something like this which uses sum() over():
select distinct grp,
sum(val) over(partition by grp)
/ (sum(val) over(partition by criteria)*1.0) Total
from yourtable
where criteria = 1
See SQL Fiddle with Demo
The result is:
| GRP | TOTAL |
------------------------
| a | 0.222222222222 |
| b | 0.333333333333 |
| c | 0.333333333333 |
| d | 0.111111111111 |
I completely agree with #bluefeet's response -- this is just a little more of a database-independent approach (should work with most RDBMS):
select distinct
grp,
sum(val)/cast(total as decimal)
from yourtable
cross join
(
select SUM(val) as total
from yourtable
) sumtable
where criteria = 1
GROUP BY grp, total
And here is the SQL Fiddle.

How can I calculate the remaining amount per row?

I have a table that I want to find for each row id the amount remaining from the total. However, the order of amounts is in an ascending order.
id amount
1 3
2 2
3 1
4 5
The results should look like this:
id remainder
1 10
2 8
3 5
4 0
Any thoughts on how to accomplish this? I'm guessing that the over clause is the way to go, but I can't quite piece it together.Thanks.
Since you didn't specify your RDBMS, I will just assume it's Postgresql ;-)
select *, sum(amount) over() - sum(amount) over(order by amount) as remainder
from tbl;
Output:
| ID | AMOUNT | REMAINDER |
---------------------------
| 3 | 1 | 10 |
| 2 | 2 | 8 |
| 1 | 3 | 5 |
| 4 | 5 | 0 |
How it works: http://www.sqlfiddle.com/#!1/c446a/5
It works in SQL Server 2012 too: http://www.sqlfiddle.com/#!6/c446a/1
Thinking of solution for SQL Server 2008...
Btw, is your ID just a mere row number? If it is, just do this:
select
row_number() over(order by amount) as rn
, sum(amount) over() - sum(amount) over(order by amount) as remainder
from tbl
order by rn;
Output:
| RN | REMAINDER |
------------------
| 1 | 10 |
| 2 | 8 |
| 3 | 5 |
| 4 | 0 |
But if you really need the ID intact and move the smallest amount on top, do this:
with a as
(
select *, sum(amount) over() - sum(amount) over(order by amount) as remainder,
row_number() over(order by id) as id_sort,
row_number() over(order by amount) as amount_sort
from tbl
)
select a.id, sort.remainder
from a
join a sort on sort.amount_sort = a.id_sort
order by a.id_sort;
Output:
| ID | REMAINDER |
------------------
| 1 | 10 |
| 2 | 8 |
| 3 | 5 |
| 4 | 0 |
See query progression here: http://www.sqlfiddle.com/#!6/c446a/11
I just want to offer a simpler way to do this in descending order:
select id, sum(amount) over (order by id desc) as Remainder
from t
This will work in Oracle, SQL Server 2012, and Postgres.
The general solution requres a self join:
select t.id, coalesce(sum(tafter.amount), 0) as Remainder
from t left outer join
t tafter
on t.id < tafter.id
group by t.id
SQL Server 2008 answer, I can't provide an SQL Fiddle, it seems it strips the begin keyword, resulting to syntax errors. I tested this on my machine though:
create function RunningTotalGuarded()
returns #ReturnTable table(
Id int,
Amount int not null,
RunningTotal int not null,
RN int identity(1,1) not null primary key clustered
)
as
begin
insert into #ReturnTable(id, amount, RunningTotal)
select id, amount, 0 from tbl order by amount;
declare #RunningTotal numeric(16,4) = 0;
declare #rn_check int = 0;
update #ReturnTable
set
#rn_check = #rn_check + 1
,#RunningTotal =
case when rn = #rn_check then
#RunningTotal + Amount
else
1 / 0
end
,RunningTotal = #RunningTotal;
return;
end;
To achieve your desired output:
with a as
(
select *, sum(amount) over() - RunningTotal as remainder
, row_number() over(order by id) as id_order
from RunningTotalGuarded()
)
select a.id, amount_order.remainder
from a
inner join a amount_order on amount_order.rn = a.id_order;
Rationale for guarded running total: http://www.ienablemuch.com/2012/05/recursive-cte-is-evil-and-cursor-is.html
Choose the lesser evil ;-)