SQL SUM stop if hit a threshold

SQL SUM stop if hit a threshold - sql

I am doing a back testing where I need to calculate each store's losses if I apply a $1000 threshold block on the sale_amount.
For example, store_id = a, the first two rows add up to 700, but the third transaction = $400 will still go through, total 700+ 400 =1100, then a batch process to run and trigger the 1000 block, so the 4rd trans get blocked, what I need to calculate is the all the amount after the block being triggered, which is store_id = a , which is $99. store b is $800+100+50
This is my sample data, please advise how to use temporary table to solve this
Create table stadium
(
Trans_id int,
Store_id varchar,
sale_amount int
)
insert into stadium (Trans_id, Store_id, sale_amount) values ('1', 'a', '500')
insert into stadium (Trans_id, Store_id, sale_amount) values ('2', 'a', '200')
insert into stadium (Trans_id, Store_id, sale_amount) values ('3', 'a', '400')
insert into stadium (Trans_id, Store_id, sale_amount) values ('4', 'a', '99')
insert into stadium (Trans_id, Store_id, sale_amount) values ('5', 'b', '700')
insert into stadium (Trans_id, Store_id, sale_amount) values ('6', 'b', '100')
insert into stadium (Trans_id, Store_id, sale_amount) values ('7', 'b', '800')
insert into stadium (Trans_id, Store_id, sale_amount) values ('8', 'b', '100')
insert into stadium (Trans_id, Store_id, sale_amount) values ('9', 'b', '50')

I think this requires a recursive CTE -- because you hit a threshold, but skip and keep on going.
So, this does what you describe:
with s as (
select s.*, row_number() over (partition by store_id order by trans_id) as seqnum
from stadium s
),
cte as (
select store_id, trans_id, sale_amount as running_amount, 1 as include_flag, seqnum
from s
where seqnum = 1
union all
select s.store_id, s.trans_id,
(case when s.sale_amount + cte.running_amount <= 1000 then s.sale_amount + cte.running_amount else cte.running_amount end),
(case when s.sale_amount + cte.running_amount <= 1000 then 1 else 0 end),
s.seqnum
from cte join
s
on s.store_id = cte.store_id and s.seqnum = cte.seqnum + 1
)
select s.*
from cte join
s
on s.trans_id = cte.trans_id
where include_flag = 0;
Here is a db<>fiddle.

Related

Get userwise balance and first transaction date of users in SQL

I have created a Transaction table with columns card_id, amount, created_at. There may be more than 1 row of one user so I want to return the value card_id, sum(amount), first created_at date of all users.
CREATE TABLE Transactions(card_id int, amount money, created_at date)
INSERT INTO Transactions(card_id, amount, created_at)
SELECT 1, 500, '2016-01-01' union all
SELECT 1, 100, '2016-01-01' union all
SELECT 1, 100, '2016-01-01' union all
SELECT 1, 200, '2016-01-02' union all
SELECT 1, 300, '2016-01-03' union all
SELECT 2, 100, '2016-01-04' union all
SELECT 2, 200, '2016-01-05' union all
SELECT 3, 700, '2016-01-06' union all
SELECT 1, 100, '2016-01-07' union all
SELECT 2, 100, '2016-01-07' union all
SELECT 3, 100, '2016-01-07'
I have created function for that but one of my client says I need query not function. Can anyone here suggest what query to use?
CREATE FUNCTION [dbo].[card_id_data]()
RETURNS #t TABLE
(
card_id text,
amount money,
dateOfFirstTransaction date
)
AS
BEGIN
INSERT INTO #t(card_id)
SELECT DISTINCT(card_id) FROM Transactions;
UPDATE #t
SET dateOfFirstTransaction = b.createdat
FROM
(SELECT DISTINCT(card_id) cardid,
MIN(created_at) createdat
FROM Transactions
WHERE amount < 0
GROUP BY card_id) b
WHERE card_id = b.cardid;
UPDATE #t
SET amount = T.AMOUNT
FROM
(SELECT
card_id AS cardid, SUM(MIN(AMOUNT)) AMOUNT, created_at
FROM Transactions
WHERE amount < 0
GROUP BY card_id, created_at) T
WHERE card_id = cardid
AND dateOfFirstTransaction = created_at;
RETURN
END
I want a result as shown in this screenshot:

You can use DENSE_RANK for this. It will number the rows, taking into account tied places (same dates)
SELECT
t.card_id,
SumAmount = SUM(amount),
FirstDate = MIN(t.created_at)
FROM (
SELECT *,
rn = DENSE_RANK() OVER (PARTITION BY t.card_id ORDER BY t.created_at)
FROM dbo.Transactions t
) t
WHERE t.rn = 1
GROUP BY t.card_id;
If the dates are actually dates and times, and you want to sum the whole day, change t.created_at to CAST(t.created_at AS date)

Try this:
/*
CREATE TABLE dbo.Transactions
(
card_id INT,
amount MONEY,
created_at DATE
);
INSERT INTO dbo.Transactions (card_id, amount, created_at)
VALUES (1, 500, '2016-01-01'),
(1, 100, '2016-01-01'),
(1, 100, '2016-01-01'),
(1, 200, '2016-01-02'),
(1, 300, '2016-01-03'),
(2, 100, '2016-01-04'),
(2, 200, '2016-01-05'),
(3, 700, '2016-01-06'),
(1, 100, '2016-01-07'),
(2, 100, '2016-01-07'),
(3, 100, '2016-01-07');
*/
WITH FirstDatePerCard AS
(
SELECT
card_id,
FirstDate = MIN(created_at)
FROM
dbo.Transactions
GROUP BY
card_id
)
SELECT DISTINCT
t.card_id,
SumAmount = SUM(amount) OVER (PARTITION BY t.card_id),
FirstDate = f.FirstDate
FROM
FirstDatePerCard f
INNER JOIN
dbo.Transactions t ON f.card_id = t.card_id AND f.FirstDate = t.created_at
You'll get an output something like this:
card_id SumAmount FirstDate
--------------------------------
1 700.00 2016-01-01
2 100.00 2016-01-04
3 700.00 2016-01-06
Is that what you're looking for??
UPDATE: OK, so you want to sum the amount only for the first_date, for every card_id - is that correct? (wasn't clear from the original question)
Updated my solution accordingly

SQL not using ALIAS column for calculation

Question Statement - From the given trips and users tables for a taxi service, write a query to return the cancellation rate in the first two days in October, rounded to two decimal places, for trips not involving banned riders or drivers.
Question code on Oracle SQL.
create table trips (trip_id int, rider_id int, driver_id int, status varchar2(200), request_date date);
insert into trips values (1, 1, 10, 'completed', to_date ('2020-10-01', 'YYYY/MM/DD'));
insert into trips values (2, 2, 11, 'cancelled_by_driver', to_date ('2020-10-01', 'YYYY/MM/DD'));
insert into trips values (3, 3, 12, 'completed', to_date ('2020-10-01', 'YYYY/MM/DD'));
insert into trips values (4, 4, 10, 'cancelled_by_driver', to_date ('2020-10-02', 'YYYY/MM/DD'));
insert into trips values (5, 1, 11, 'completed', to_date ('2020-10-02', 'YYYY/MM/DD'));
insert into trips values (6, 2, 12, 'completed', to_date ('2020-10-02', 'YYYY/MM/DD'));
insert into trips values (7, 3, 11, 'completed', to_date ('2020-10-03', 'YYYY/MM/DD'));
create table users (user_id int, banned varchar2(200), type varchar2(200));
insert into users values (1, 'no', 'rider');
insert into users values (2, 'yes', 'rider');
insert into users values (3, 'no', 'rider');
insert into users values (4, 'no', 'rider');
insert into users values (10, 'no', 'driver');
insert into users values (11, 'no', 'driver');
insert into users values (12, 'no', 'driver');
My Solution Code is below. However, I get the following error. Can someone pleas help?
ORA-00904: "TOTAL_TRIPS": invalid identifier
SOLUTION CODE:
select request_date, (1-(trips_completed/total_trips)) as "cancel_rate"
from
((
select request_date,
sum(case when status = 'completed' then 1 else 0 end) as "trips_completed",
sum(case when status = 'cancelled_by_driver' then 1 else 0 end) as "trips_cancelled",
sum(case when status = 'cancelled_by_driver' then 1 when status= 'completed' then 1 else 0 end) as "total_trips"
from
(
select t.rider_id, t.driver_id, t.status, t.request_date, u.banned as "not_banned_rider", u.banned as "not_banned_driver"
from trips t
join users u
on t.rider_id=u.user_id
where u.banned='no'
)
group by request_date
having request_date <> to_date ('2020-10-03', 'YYYY/MM/DD')
));

First, don't put identifiers in double quotes. They just clutter up queries.
Some other things to fix:
No need for two levels of subqueries.
Learn to use proper date literal syntax.
I think you want < rather than <>.
So that suggests:
select request_date, (1-(trips_completed/total_trips)) as cancel_rate
from (select request_date,
sum(case when status = 'completed' then 1 else 0 end) as trips_completed,
sum(case when status = 'cancelled_by_driver' then 1 else 0 end) as trips_cancelled,
sum(case when status = 'cancelled_by_driver' then 1 when status = 'completed' then 1 else 0 end) as total_trips
from trips t join
users u
on t.rider_id = u.user_id
where u.banned = 'no' and
t.request_date < date '2020-10-03'
group by request_date
) rd;
This can be further simplified using avg():
select request_date,
avg(case when status = 'completed' then 1 else 0 end) as cancel_rate
from trips t join
users u
on t.rider_id = u.user_id
where u.banned = 'no' and
request_date < date '2020-10-03'
group by request_date ;
Note: This addresses fixing the query in your question. It doesn't actually correctly answer the question, for the following reasons:
I'm pretty sure the question entails one cancellation rate, not one for two dates.
It doesn't take into account banned drivers.
I'm not sure how "cancelled by user" would be handled.

ORA-00904: "TOTAL_TRIPS": invalid identifier
just means what is written "total_trips" is invalid
Just use total_trips (without quote)

Looking for the same Trader buying and selling the same product within the 3minutes

Below I've the example table
Create Table #A
(
Time nvarchar(70),
Trader nvarchar(30),
Product nvarchar(30),
[Buy/Sell] nvarchar(30)
)
Insert into #A Values
('2019-03-01T14:22:29z', 'Jhon', 'Apple', 'Buy'),
('2019-03-01T12:35:09z', 'Jhon', 'Orange', 'Sell'),
('2019-03-01T12:35:09z', 'Mary', 'Milk', 'Buy'),
('2019-03-01T12:35:10z', 'Susan', 'Milk', 'Buy'),
('2019-03-01T12:35:23z', 'Tom', 'Bread', 'Sell'),
('2019-03-01T14:15:52z', 'Jhon', 'Apple', 'Sell'),
('2019-03-01T14:15:53z', 'Tom', 'Orange', 'Sell'),
('2019-03-01T14:22:33z', 'Mary', 'Apple', 'Buy'),
('2019-03-01T14:22:37z', 'Mary', 'Orange', 'Sell'),
('2019-03-01T12:37:41z', 'Susan', 'Milk', 'Buy'),
('2019-03-01T12:37:41z', 'Susan', 'Milk', 'Buy')
Select * from #A
Basically I'm to get the same Trader buying and selling the same product within the 3minutes
Below I've tried this but not the correct one and working
;With DateTimeTbl
as
(
select SUBSTRING(a.Time,1,10) date, SUBSTRING(a.Time,12,8) Time1, a.*
-- lead(Time) over(order by time) cnt
from #A a
),
DataTbl
as
(
Select d.*, row_number() over(Partition by d.Trader,d.product order by d.time1) CntSrs
from DateTimeTbl d
--where [buy/sell] = 'Sell'
)
Select lag(Time1) over(order by time) cnt, d.* from DataTbl d where CntSrs>1

Basically I'm to get the same Trader buying and selling the same product within the 3minutes
I would suggest lead(). To get the first record:
select a.*
from (select a.*,
lead(time) over (partition by trader, product order by time) as next_time,
lead(buy_sell) over (partition by trader, product order by time) as next_buy_sell
from #a a
) a
where next_time < dateadd(minute, 3, time) and
buy_sell <> next_buy_sell;
Note: This assumes that buy_sell takes on only two values, which is consistent with your sample data.
Here is a db<>fiddle. Note that it fixes the data types to be appropriate (for the time column) and renames the last column so it does not need to be escaped.

How do I make an aggregate on an integer with a grouped column, for which I only want some included?

I have a table prices holding all prices that some products have had:
CREATE TABLE prices (
id INT,
product_id INT, /*Foreign key*/
created_at TIMESTAMP,
price INT
);
The first entity for a product_id is it's initial sales price. If the product is then reduced, a new entity will be added.
I would like to find the mean and total price change per day across all products.
This is some sample data:
INSERT INTO prices (id, product_id, created_at, price) VALUES (1, 1, '2020-01-01', 11000);
INSERT INTO prices (id, product_id, created_at, price) VALUES (2, 2, '2020-01-01', 3999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (3, 3, '2020-01-01', 9999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (4, 4, '2020-01-01', 2000);
INSERT INTO prices (id, product_id, created_at, price) VALUES (5, 1, '2020-01-02', 9999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (6, 2, '2020-01-02', 2999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (7, 5, '2020-01-02', 2999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (8, 1, '2020-01-03', 8999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (9, 1, '2020-01-03 10:00:00', 7000);
INSERT INTO prices (id, product_id, created_at, price) VALUES (10, 5, '2020-01-03', 4000);
INSERT INTO prices (id, product_id, created_at, price) VALUES (11, 6, '2020-01-03', 3999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (12, 3, '2020-01-03', 6999);
The expected result should be:
date mean_price_change total_price_change
2020-01-01 0 0
2020-01-02 1000.5 2001
2020-01-03 1666 4998
Explanation:
Mean price reduction and total on '2020-01-01' was 0 as all products were new on that date.
On '2020-01-02' however the mean price change was: (11000-9999 + 3999-2999)/2 = 1000.5 as both product_id 1 and 2 has been reduced to 9999 and 2999 on that day, and their previous prices were 11000 and 3999 and there total reduction would be: (11000-9999 + 3999-2999) = 2001.
On '2020-01-03' only product_id 1, 3 and 5 were changed. 1 at two different times on the day: 9999 => 8999 => 7000 (last one governing) and 3: going from 9999 => 6999 a then 5: going up from 2999 => 4000. This gives a total of: (9999-7000 + 9999-6999 + 2999-4000) = 4998 and a mean price reduction on that day of: 1666
I have added the data here too: https://www.db-fiddle.com/f/tJgoKFMJxcyg5gLDZMEP77/1
I stated to play around with some DISTINCT ON but that does not seem to do it...

You seem to want lag() and aggregation:
select created_at, avg(prev_price - price), sum(prev_price - price)
from (select p.*, lag(price) over (partition by product_id order by created_at) as prev_price
from prices p
) p
group by created_at
order by created_at;
You have two prices for product 1 on 2020-01-03. Once I fix that, I get the same results as in your question. Here is the db<>fiddle.
EDIT:
To handle multiple prices per day:
select created_at, avg(prev_price - price), sum(prev_price - price)
from (select p.*, lag(price) over (partition by product_id order by created_at) as prev_price
from (select distinct on (product_id, created_at::date) p.*
from prices p
order by product_id, created_at::date
) p
) p
group by created_at
order by created_at;

try this
select
created_at,
avg(change),
sum(change)
from
(
with cte as
(
select
id,
product_id,
created_at,
lag(created_at) over(order by product_id, created_at) as last_date,
price
from prices
)
select
c.id,
c.product_id,
c.created_at,
c.last_date,
p.price as last_price,
c.price,
COALESCE(p.price - c.price,0) as change
from cte c
left join prices p on c.product_id =p.product_id and c.last_date =p.created_at
where p.price != c.price or p.price is null
) tmp
group by created_at
order by created_at

The query below tracks all price changes, notice that we join current and earlier based on
their product being the same
earlier is indeed earlier than current
earlier is the latest item on a date earlier than current
current is the latest item on its own date
select today.product_id, (today.price - coalesce(earlier.price)), today.created_at as difference
from prices current
join prices earlier
on today.product_id = earlier.product_id and earlier.created_at < current.created_at
where not exists (
select 1
from prices later
where later.product_id = today.product_id and
(
((today.created_at = later.created_at) and (today.id < later.id)) or
((earlier.created_at <= later.created_at) and (earlier.id < later.id))
)
);
Now, let's do some aggregation:
select created_at, avg(today.price - coalesce(earlier.price)) as mean, sum(today.price - coalesce(earlier.price)) as total
from prices current
left join prices earlier
on today.product_id = earlier.product_id and earlier.created_at < current.created_at
where not exists (
select 1
from prices later
where later.product_id = today.product_id and
(
((today.created_at = later.created_at) and (today.id < later.id)) or
((earlier.created_at <= later.created_at) and (earlier.id < later.id))
)
)
group by created_at
order by created_at;

Postgres select if another event exists before and after a time range

I have a table like this:
I need a select the following records:
All category A
Category B only if before and after 20 seconds a category A exists for the same name
To create a test table:
CREATE TABLE test(
time TIMESTAMP,
name CHAR(10),
category CHAR(50)
);
INSERT INTO test (time, name, category)
VALUES ('2019-02-25 18:30:10', 'john', 'A'),
('2019-02-25 18:30:15', 'john', 'B'),
('2019-02-25 19:00:00', 'phil', 'A'),
('2019-02-25 20:00:00', 'tim', 'A'),
('2019-02-25 21:00:00', 'tim', 'B'),
('2019-02-25 21:00:00', 'frank', 'B');
So from the above, this is the desired output:

You can use an exists subquery to determine if there is an A row within 20 seconds:
select *
from test t1
where category = 'A'
or exists
(
select *
from test t2
where t2.category = 'A'
and t2.name = t1.name
and abs(extract(epoch from t2.time - t1.time)) < 20
)

You can use exists. But you can also use window functions:
select t.*
from (select t.*,
max(t.time) filter (t.category = 'A') over (partition by name order by time) as prev_a,
min(t.time) filter (t.category = 'A') over (partition by name order by time desc) as next_a
from test t
) t
where category = 'A' or
(category = 'B' and
(prev_a > time - interval '20 second' or
next_a < time + interval '20 second'
)
);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL SUM stop if hit a threshold - sql

Related

Get userwise balance and first transaction date of users in SQL

SQL not using ALIAS column for calculation

Looking for the same Trader buying and selling the same product within the 3minutes

How do I make an aggregate on an integer with a grouped column, for which I only want some included?

Postgres select if another event exists before and after a time range

Categories

Resources