SQL: Show Records Once SUM Threshold Is Reached - sql

I have a table, sorted on a date value (ASC).
+----+------------+-------+
| Id | Date | Value |
+----+------------+-------+
| 1 | 2018-01-01 | 10 |
| 2 | 2018-01-02 | 5 |
| 3 | 2018-01-03 | 15 |
| 4 | 2018-01-04 | 0 |
| 5 | 2018-01-05 | 5 |
| 6 | 2018-01-06 | 10 |
| 7 | 2018-01-07 | 5 |
| 8 | 2018-01-08 | 0 |
| 9 | 2018-01-09 | 0 |
| 10 | 2018-01-10 | 10 |
+----+------------+-------+
I would like to create a view that only returns the records once the SUM of the Value is higher than 30, starting from the first record.
So my threshold is 30, every record with a value that fits in the first 30 should be hidden.
All records that follow once this threshold is reached, need to be shown.
This means that my required result looks like this:
+----+------------+-------+
| Id | Date | Value |
+----+------------+-------+
| 4 | 2018-01-04 | 0 |
| 5 | 2018-01-05 | 5 |
| 6 | 2018-01-06 | 10 |
| 7 | 2018-01-07 | 5 |
| 8 | 2018-01-08 | 0 |
| 9 | 2018-01-09 | 0 |
| 10 | 2018-01-10 | 10 |
+----+------------+-------+
As you can see, Id's 1, 2 and 3 are left out, because their values (10, 5 and 15) SUM up to 30.
Once this threshold is reached, the remaining records are visible (even the 0 value of Id 4).
I've created some scripts to setup a test table with data:
-- Create test table
CREATE TABLE thresholdTest (
[Id] INT IDENTITY(1,1) PRIMARY KEY,
[Date] DATE NOT NULL,
[Value] INT NOT NULL
)
-- Insert dummies
INSERT INTO [thresholdTest] ([Date],[Value])
VALUES
('2018-01-01',10),
('2018-01-02',5),
('2018-01-03',15),
('2018-01-04',0),
('2018-01-05',5),
('2018-01-06',10),
('2018-01-07',5),
('2018-01-08',0),
('2018-01-09',0),
('2018-01-10',10);
-- Select ordered by date
SELECT *
FROM [thresholdTest]
ORDER BY [Date] ASC
All I need is a SELECT statement / view.
The threshold is always static (30 in this example).
The data could ofcourse differ, but it's always sorted on a Date and includes a Value.
Thank you in advance.

I'd use a window function:
;with cte as(
select *, tot = sum([Value]) over (order by [Date])
from thresholdTest
)
select
Id,
[Date],
[Value]
from cte
where
(tot >= 30 and [Value] = 0)
or tot > 30

You can try to use SUM with window function in subquery to accumulated totle then write condition in main query.
select Id,
Date,
Value
from
(
SELECT *,
SUM(Value) OVER(ORDER BY Date) totle
FROM thresholdTest
) t
WHERE totle > 30 OR (Value = 0 AND totle = 30)
[Results]:
| Id | Date | Value |
|----|------------|-------|
| 4 | 2018-01-04 | 0 |
| 5 | 2018-01-05 | 5 |
| 6 | 2018-01-06 | 10 |
| 7 | 2018-01-07 | 5 |
| 8 | 2018-01-08 | 0 |
| 9 | 2018-01-09 | 0 |
| 10 | 2018-01-10 | 10 |
sqlfiddle

Yet another way to do it
select t1.id, t1.Date,t1.Value
from [thresholdTest] t1
inner join [thresholdTest] t2 on t1.id >= t2.id
group by t1.id, t1.value, t1.Date
HAVING SUM(t2.VAlue)>30 OR( SUM(t2.value)=30 AND t1.value=0)

Related

How to sum 2 columns and add it with the previous summed columns in sql?

I have a table with these rows:
+------+--------+---------+---------+
| ID | Date | Amount1 | Amount2 |
+------+--------+---------+---------+
| 1 | 13 Nov | 8 | 3 |
| 2 | 11 Nov | 5 | 1 |
| 3 | 15 Nov | 0 | 3 |
| 4 | 18 Nov | 5 | 7 |
| 5 | 20 Nov | 10 | 0 |
+------+--------+---------+---------+
Would like to query with these result with the formula
Total = (Amount1 - Amount2) + Previous Row's Total
+------+--------+---------+---------+---------+
| ID | Date | Plus | Minus | Total |
+------+--------+---------+---------+---------+
| 2 | 11 Nov | 5 | 1 | 4 |
| 1 | 13 Nov | 8 | 3 | 9 |
| 3 | 15 Nov | 0 | 3 | 6 |
| 4 | 18 Nov | 5 | 7 | 4 |
| 5 | 20 Nov | 10 | 0 | 14 |
+------+--------+---------+---------+---------+
Is there any way to query this without binding the Total to a column on temporary table?
To get a running total, you can use SUM(columnname) OVER (ORDER BY sortedcolumnname).
To me it's actually a little counterintuitive compared to most windowed functions, as it doesn't have a partition but produces different results over the set of rows. However, it does work.
Here is some somewhat-obfuscated documentation from Microsoft about it.
I think you can therefore use
SELECT mt.[ID],
mt.[Date],
mt.[Amount1] AS [Plus],
mt.[Amount2] AS [Minus],
SUM(mt.[Amount1] - mt.[Amount2]) OVER (ORDER BY mt.[Date], mt.[ID]) AS Total
FROM mytable mt
ORDER BY mt.[Date],
mt.[ID];
And here are the results - they match yours.
ID Date Plus Minus Total
2 2020-11-11 5 1 4
1 2020-11-13 8 3 9
3 2020-11-15 0 3 6
4 2020-11-18 5 7 4
5 2020-11-20 10 0 14
Demo
You can acheive this using CTE first followed by self join. For amount1 - amount2, for id=3, you will be getting 0 -3 = -3. So, for id 3, the result below will be different for id=3
DECLARE #t table(id int, dateval date, amount1 int, amount2 int)
INSERT INTO #t
values
(1 ,'2020-11-13', 8, 3),
(2 ,'2020-11-11', 5, 1),
(3 ,'2020-11-15', 0, 3),
(4 ,'2020-11-18', 5, 7),
(5 ,'2020-11-20',10, 0);
;WITH CTE_First AS
(
SELECT id, dateval, amount1 as plus, amount2 as minus, (amount1-amount2) as total ,
ROW_NUMBER() OVER (ORDER BY dateval) as rnk
FROM #t
)
SELECT c.ID, c.DATEVAL, c.plus,c.minus,c.total + isnull(c1.total,0) as new_total
FROM CTE_First AS c
left outer join CTE_First AS C1
on C1.rnk = c.rnk- 1
+----+------------+------+-------+-----------+
| ID | DATEVAL | plus | minus | new_total |
+----+------------+------+-------+-----------+
| 2 | 2020-11-11 | 5 | 1 | 4 |
| 1 | 2020-11-13 | 8 | 3 | 9 |
| 3 | 2020-11-15 | 0 | 3 | 2 |
| 4 | 2020-11-18 | 5 | 7 | -5 |
| 5 | 2020-11-20 | 10 | 0 | 8 |
+----+------------+------+-------+-----------+

Best performance aproach for get the last records sorted by date

I have a table with more than 60 million records. This table has 510 columns. All columns are doubles.
I created a index for date field:
CREATE INDEX index_btree_date ON mytable USING BTREE (date);
The table looks like this:
--------------------------------------------------------------
| id | date | fk_system | col1 | col2 | col-n |
--------------------------------------------------------------
| 1 | 2020-08-05 15:00:00 | 1 | 1 | 2 | 3 |
--------------------------------------------------------------
| 2 | 2020-08-05 15:00:00 | 2 | 1 | 2 | 3 |
--------------------------------------------------------------
| 3 | 2020-08-05 15:01:00 | 1 | 1 | 2 | 3 |
--------------------------------------------------------------
| 4 | 2020-08-05 15:01:00 | 2 | 1 | 2 | 3 |
--------------------------------------------------------------
| 5 | 2020-08-05 15:02:00 | 1 | 1 | 2 | 3 |
--------------------------------------------------------------
| 6 | 2020-08-05 15:02:00 | 2 | 1 | 2 | 3 |
--------------------------------------------------------------
| 7 | 2020-08-05 15:03:00 | 1 | 1 | 2 | 3 |
--------------------------------------------------------------
| 8 | 2020-08-05 15:03:00 | 2 | 1 | 2 | 3 |
--------------------------------------------------------------
I tried to run this query:
SELECT t.id, t.date, t.fk_system, t.col1, t.col2, t.col3
FROM mytable t
WHERE t.fk_system = 106
ORDER BY t.date DESC
LIMIT 2880;
This query take more than 5 minutes to run, very bad performance.
Can someone help me?
Thanks!
For this query:
SELECT t.id, t.date, t.fk_system, t.col1, t.col2, t.col3
FROM mytable t
WHERE t.fk_system = 106
ORDER BY t.date DESC
LIMIT 2880;
You want an index on (fk_system, date desc). An index where date is first does not really help with this query.

Selecting latest consecutive records that match a condition with PostgreSQL

I am looking for a PostgreSQL query to find the latest consecutive records that match a condition. Let me explain it better with an example:
| ID | HEATING STATE | DATE |
| ---- | --------------- | ---------- |
| 1 | ON | 2018-02-19 |
| 2 | ON | 2018-02-20 |
| 3 | OFF | 2018-02-20 |
| 4 | OFF | 2018-02-21 |
| 5 | ON | 2018-02-21 |
| 6 | OFF | 2018-02-21 |
| 7 | ON | 2018-02-22 |
| 8 | ON | 2018-02-22 |
| 9 | ON | 2018-02-22 |
| 10 | ON | 2018-02-23 |
I need to find all the recent consecutive records with date >= 2018-02-20 and heating_state ON, i.e. the ones with ID 7, 8, 9, 10. My main issue is with the fact that they must be consecutive.
For further clarification, if needed:
ID 1 is excluded because older than 2018-02-20
ID 2 is excluded because followed by ID 3 which has heating state OFF
ID 3 is excluded because it has heating state OFF
ID 4 is excluded because it is followed by ID 5, which has heating OFF
ID 5 is excluded because it has heating state OFF
ID 6 is excluded because it has heating state OFF
I think this is best solved using windows functions and a filtered aggregate.
For each row, add the number of later rows that have state = 'OFF', then use only the rows where that count is 0.
You need a subquery because you cannot use a window function result in the WHERE condition (WHERE is evaluated before window functions).
SELECT id, state, date
FROM (SELECT id, state, date,
count(*) FILTER (WHERE state = 'OFF')
OVER (ORDER BY date DESC, state DESC) AS later_off_count
FROM tab) q
WHERE later_off_count = 0;
id | state | date
----+-------+------------
10 | ON | 2018-02-23
9 | ON | 2018-02-22
8 | ON | 2018-02-22
7 | ON | 2018-02-22
(4 rows)
Use the LEAD function with a CASE expression.
SQL Fiddle
Query 1:
SELECT id,
heating_state,
dt
FROM (SELECT t.*,
CASE
WHEN dt >= timestamp '2018-02-20'
AND heating_state = 'ON'
AND LEAD(heating_state, 1, heating_state)
OVER (
ORDER BY dt ) = 'ON' THEN 1
ELSE 0
END on_state
FROM t) s
WHERE on_state = 1
Results:
| id | heating_state | dt |
|----|---------------|----------------------|
| 7 | ON | 2018-02-22T00:00:00Z |
| 8 | ON | 2018-02-22T00:00:00Z |
| 9 | ON | 2018-02-22T00:00:00Z |
| 10 | ON | 2018-02-23T00:00:00Z |

SQL - Finding sequence of events

I need some help identifying a sequence of events in SQL Server 08 R2.
This is the sample data:
ID | SampleTime | SampleValue | CycleNum
1 | 07:00:00 | 10 |
2 | 07:02:00 | 10 |
3 | 07:05:00 | 10 |
4 | 07:12:00 | 20 |
5 | 07:15:00 | 10 |
6 | 07:22:00 | 10 |
7 | 07:23:00 | 20 |
8 | 07:30:00 | 20 |
9 | 07:31:00 | 10 |
I have used the following as a guide, link
, but it doesn't give the required output
The rules are:
A cycle starts at 10 and finishes at 20
There can be multiple 10s before a 20, and multiple 20s before the next 10
A cycle will always start at the first 10, and finish on the last 20 before the next 10.
Example Output
ID | SampleTime | SampleValue | CycleNum
1 | 07:00:00 | 10 | 1
2 | 07:02:00 | 10 | 1
3 | 07:05:00 | 10 | 1
4 | 07:12:00 | 20 | 1
5 | 07:15:00 | 10 | 2
6 | 07:22:00 | 10 | 2
7 | 07:23:00 | 20 | 2
8 | 07:30:00 | 20 | 2
9 | 07:31:00 | 10 | 3
Test Table
CREATE TABLE myTable (ID INT IDENTITY, SampleTime DATETIME, SampleValue INT, CycleNum INT)
INSERT INTO myTable (SampleTime, SampleValue)
VALUES ('07:00:00',10),
('07:02:00',10),
('07:05:00',10),
('07:12:00',20),
('07:15:00',10),
('07:22:00',10),
('07:23:00',20),
('07:30:00',20),
('07:31:00',10)
Try this... this will give the mapping of ID and CYCLENUM
WITH EVE_DATA AS (
SELECT ID
, SAMPLETIME
, SAMPLEVALUE
, CASE
WHEN (SAMPLEVALUE - lag(SAMPLEVALUE, 1, 0) over (order by SAMPLETIME ASC)) = -10
THEN 1
ELSE 0
END AS START_IND
FROM
MY_TABLE
)
SELECT T1.id
, SUM(T2.START_IND) + 1 AS CycleNum
FROM EVE_DATA T1
JOIN EVE_DATA T2
ON T1.ID >= T2.ID
GROUP BY T1.ID
ORDER BY T1.ID;

Making a partition query, reporting the first NOT NULL occurrence within partition before current row (if any)

I have a logins table which looks like this:
person_id | login_at | points_won
-----------+----------------+----------------------
1 | 2017-02-02 |
1 | 2017-02-01 |
2 | 2017-02-01 | 2
1 | 2017-01-29 | 2
2 | 2017-01-28 |
2 | 2017-01-25 | 1
3 | 2017-01-22 |
3 | 2017-01-21 |
1 | 2017-01-10 | 3
1 | 2017-01-01 | 1
I want to generate a result set containing a points_won column, which should work something like: For each row partition based on the person_id order the partition by login_at desc then report the first occurrence (not null) of last_points_won of the ordered rows in the partition (if any).
It should result in something like this:
person_id | login_at | points_won | last_points_won
-----------+----------------+----------------------+----------------------
1 | 2017-02-02 | | 2
1 | 2017-02-01 | | 2
2 | 2017-02-01 | 2 | 2
1 | 2017-01-29 | 2 | 2
2 | 2017-01-28 | | 1
2 | 2017-01-25 | 1 | 1
3 | 2017-01-22 | |
3 | 2017-01-21 | |
1 | 2017-01-10 | 3 | 3
1 | 2017-01-01 | 1 | 1
Or in plain words:
for each row, give me either the points won during this login OR if none, give
me the points won at the persons latest previous login, where he actually made some
points.
This could be achieved within a single window too, with the IGNORE NULLS option of the last_value() window function. But that's not supported in PostgreSQL yet. One alternative is the FILTER (WHERE ...) clause, but that will only work, when the window function is an aggregate function in the first place (which is not true for last_value(), but something similar could be created easily with CREATE AGGREGATE). To solve this with only built-in aggregates, you can use the array_agg() too:
SELECT (tbl).*,
all_points_won[array_upper(all_points_won, 1)] last_points_won
FROM (SELECT tbl,
array_agg(points_won)
FILTER (WHERE points_won IS NOT NULL)
OVER (PARTITION BY person_id ORDER BY login_at) all_points_won
FROM tbl) s
Note: the sub-query is not needed, if you create a dedicated last_agg() aggregate, like:
CREATE FUNCTION last_val(anyelement, anyelement)
RETURNS anyelement
LANGUAGE SQL
IMMUTABLE
CALLED ON NULL INPUT
AS 'SELECT $2';
CREATE AGGREGATE last_agg(anyelement) (
SFUNC = last_val,
STYPE = anyelement
);
SELECT tbl.*,
last_agg(points_won)
FILTER (WHERE points_won IS NOT NULL)
OVER (PARTITION BY person_id ORDER BY login_at) last_points_won
FROM tbl;
Rextester sample
Edit: once the IGNORE NULLS option will be supported on PostgreSQL, you can use the following query (which should work in Amazon Redshift too):
SELECT tbl.*,
last_value(points_won IGNORE NULLS)
OVER (PARTITION BY person_id ORDER BY login_at ROW BETWEEN UNBOUNCED PRECEDING AND CURRENT ROW) last_points_won
FROM tbl;
select *
,min(points_won) over
(
partition by person_id,group_id
) as last_points_won
from (select *
,count(points_won) over
(
partition by person_id
order by login_at
) as group_id
from mytable
) t
+-----------+------------+------------+----------+-----------------+
| person_id | login_at | points_won | group_id | last_points_won |
+-----------+------------+------------+----------+-----------------+
| 1 | 2017-01-01 | 1 | 1 | 1 |
+-----------+------------+------------+----------+-----------------+
| 1 | 2017-01-10 | 3 | 2 | 3 |
+-----------+------------+------------+----------+-----------------+
| 1 | 2017-01-29 | 2 | 3 | 2 |
+-----------+------------+------------+----------+-----------------+
| 1 | 2017-02-01 | (null) | 3 | 2 |
+-----------+------------+------------+----------+-----------------+
| 1 | 2017-02-02 | (null) | 3 | 2 |
+-----------+------------+------------+----------+-----------------+
| 2 | 2017-01-25 | 1 | 1 | 1 |
+-----------+------------+------------+----------+-----------------+
| 2 | 2017-01-28 | (null) | 1 | 1 |
+-----------+------------+------------+----------+-----------------+
| 2 | 2017-02-01 | 2 | 2 | 2 |
+-----------+------------+------------+----------+-----------------+
| 3 | 2017-01-21 | (null) | 0 | (null) |
+-----------+------------+------------+----------+-----------------+
| 3 | 2017-01-22 | (null) | 0 | (null) |
+-----------+------------+------------+----------+-----------------+