How to write a SQL query to find the first time when sum greater than a number? - sql

I have a postgresql table:
create table orders
(
id int,
cost int,
time timestamp
);
How to write a PostgreSQL query to find the first time when sum(cost) is greater than 200?
For example:
id cost time
------------------
1 120 2019-10-10
2 50 2019-11-11
3 80 2019-12-12
4 60 2019-12-16
The first time sum(cost) greater than 200 is 2019-12-12.

This is a variation of Nick's answer (which would be correct with an ORDER BY). However, this version is more efficient:
select d.*
from (select d.*,
sum(d.cost) over (order by d.time) as running_cost
from d
) d
where running_cost - cost < 200 and
running_cost >= 200;
Note that this does not require an order by in the outer query to work correctly.
There is also almost a way to solve this without using a subquery:
select o.*
from orders o
order by (sum(cost) over (order by time) >= 200) desc,
time asc
limit 1;
The only issue is that this will return a row if no row matches the condition. You could get around this by using a subquery in the limit:
limit (case when (select sum(cost) from orders) >= 400 then 1 else 0 end)
But then a subquery would be needed.

For PostgreSQL, you can get this result by using a CTE to calculate the SUM of cost for rows up to and including the current one, and then selecting the first row which has total cost >= 200:
WITH CTE AS (
SELECT time,
SUM(cost) OVER (ORDER BY time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS total
FROM data
)
SELECT *
FROM CTE
WHERE total >= 200
ORDER BY total
LIMIT 1
Output:
time total
2019-12-12 250
Demo on SQLFiddle

Related

Past 7 days running amounts average as progress per each date

So, the query is simple but i am facing issues in implementing the Sql logic. Heres the query suppose i have records like
Phoneno Company Date Amount
83838 xyz 20210901 100
87337 abc 20210902 500
47473 cde 20210903 600
Output expected is past 7 days progress as running avg of amount for each date (current date n 6 days before)
Date amount avg
20210901 100 100
20210902 500 300
20210903 600 400
I tried
Select date, amount, select
avg(lg) from (
Select case when lag(amount)
Over (order by NULL) IS NULL
THEN AMOUNT
ELSE
lag(amount)
Over (order by NULL) END AS LG)
From table
WHERE DATE>=t.date-7) as avg
From table t;
But i am getting wrong avg values. Could anyone please help?
Note: Ive tried without lag too it results the wrong avgs too
You could use a self join to group the dates
select distinct
a.dt,
b.dt as preceding_dt, --just for QA purpose
a.amt,
b.amt as preceding_amt,--just for QA purpose
avg(b.amt) over (partition by a.dt) as avg_amt
from t a
join t b on a.dt-b.dt between 0 and 6
group by a.dt, b.dt, a.amt, b.amt; --to dedupe the data after the join
If you want to make your correlated subquery approach work, you don't really need the lag.
select dt,
amt,
(select avg(b.amt) from t b where a.dt-b.dt between 0 and 6) as avg_lg
from t a;
If you don't have multiple rows per date, this gets even simpler
select dt,
amt,
avg(amt) over (order by dt rows between 6 preceding and current row) as avg_lg
from t;
Also the condition DATE>=t.date-7 you used is left open on one side meaning it will qualify a lot of dates that shouldn't have been qualified.
DEMO
You can use analytical function with the windowing clause to get your results:
SELECT DISTINCT BillingDate,
AVG(amount) OVER (ORDER BY BillingDate
RANGE BETWEEN TO_DSINTERVAL('7 00:00:00') PRECEDING
AND TO_DSINTERVAL('0 00:00:00') FOLLOWING) AS RUNNING_AVG
FROM accounts
ORDER BY BillingDate;
Here is a DBFiddle showing the query in action (LINK)

Most elegant way to eliminate equal but opposite data using SQL

I have a relatively simple set of data that looks like this:
invoice_id created_at amount_in_cents user_id
22348 2019-11-07 550 31773927
22349 2019-11-08 -550 31773927
22498 2019-11-10 -3400 2389483
22499 2019-11-10 3400 2389483
22500 2019-11-11 18000 93842938
As you can see, the first two rows of the sample data are attributed to the same user_id, but are of inverse amounts (add up to 0). Same with rows 3 and 4. I want to remove all invoices where there is an inverse invoice for the same user, within 30 days of each other, leaving just the fifth row.
I could do this with python, but it would expand the process a lot. Is there a simple way to do this with SQL?
You could use not exists with a correlated subquery:
select t.*
from mytable t
where not exists (
select 1
from mytable t1
where
t1.user_id = t.user_id
and greatest(t1.created_at, t.created_at)
<= least(t1.created_at, t.created_at) + interval '30 days'
and t1.amount_in_cents = - t.amount_in_cents
)
The not exists condition ensures that no other record exists for the same user and with an opposite amount within 30 days.
I don't think there is a simple solution to this problem. If you wanted to remove all matching pairs, then you could enumerate and remove:
select min(invoice_id), min(created_at), user_id, max(amount_in_cents) as amount_in_cents
from (select t.*,
row_number() over (partition by user_id, amount_in_cents order by created_at) as seqnum
from t
) t
group by abs(amount_in_cents), user_id, seqnum
having count(*) = 1; -- only one "matching" amount
However, the limitation on 30 days is challenging and I think you might need a recursive CTE for it.
Consider the following data:
1 jan 1 500
1 jan 15 500
1 feb 1 -500
1 feb 10 -500
What result would you want?

How to select most dense 1 min in Oracle

I have table with time stamp column tmstmp, this table contains log of certain events. I need to find out the max number events which occurred within 1 min interval.
Please read carefully! I do NOT want to extract the time stamps minute fraction and sum like this:
select count(*), TO_CHAR(tmstmp,'MI')
from log_table
group by TO_CHAR(tmstmp,'MI')
order by TO_CHAR(tmstmp,'MI');
It needs to take 1st record and then look ahead until it selects all records within 1 min from the 1st and sum number of records, then take 2nd and do the same etc..
And as the result there must be a recordset of (sum, starting timestamp).
Anyone has a snippet of code somewhere and care to share please?
Analytic function with a logical window can provide this information directly:
select l.tmstmp,
count(*) over (order by tmstmp range between current row and interval '59.999999' second following) cnt
from log_table l
order by 1
;
TMSTMP CNT
--------------------------- ----------
01.01.16 00:00:00,000000000 4
01.01.16 00:00:10,000000000 4
01.01.16 00:00:15,000000000 3
01.01.16 00:00:20,000000000 2
01.01.16 00:01:00,000000000 3
01.01.16 00:01:40,000000000 2
01.01.16 00:01:50,000000000 1
Please adjust the interval length for your precision. It must be the highest possible value below 1 minute.
To get the maximal minute use the subquery (and don't forget you may receive more that one record - with the MAX count):
with tst as (
select l.tmstmp,
count(*) over (order by tmstmp range between current row and interval '59.999999' second following) cnt
from log_table l)
select * from tst where cnt = (select max(cnt) from tst);
TMSTMP CNT
--------------------------- ----------
01.01.16 00:00:00,000000000 4
01.01.16 00:00:10,000000000 4
I think you can achieve your goal using a subquery in SELECT statement, as follow:
SELECT tmstmp, (
SELECT COUNT(*)
FROM log_table t2
WHERE t2.tmstmp >= t.tmstmp AND t2.tmstmp < t.tmstmp + 1 / (24*60)
) AS events
FROM log_table t;
One method uses a join and aggregation:
select t.*
from (select l.tmstmp, count(*)
from log_table l join
log_table l2
on l2.tmstmp >= l.tmstmp and
l2.tmstmp < l.tmstmp + interval '1' minute
group by l.tmpstmp
order by count(*) desc
) t
where rownum = 1;
Note: This assumes that tmstmp is unique on each row. If this is not true, then the subquery should be aggregating by some column that is unique.
EDIT:
For large data, there is a more efficient way that makes use of cumulative sums:
select tmstamp - interval 1 minute as starttm, tmstamp as endtm, cumulative
from (select tmstamp, sum(inc) over (order by tmstamp) as cumulative
from (select tmstamp, 1 as inc from log_table union all
select tmstamp + interval '1' day, -1 as inc from log_table
) t
order by sum(inc) over (order by tmstamp) desc
) t
where rownum = 1;

Finding percentage change in prices between rows

I have been trying (to no avail) to formulate an SQL query that will return rows with the greatest change in pricing between the most recent entry and the first entry greater than 1 day previous.
Price scraping takes a non-trivial amount of time due to a large data set, so times between first and last rows for one pull will often be ± many minutes. I would like to be able to pull the first record from x time or greater ago, pseudo SELECT price FROM table WHERE date < [now epoch time in ms] - 86400000 LIMIT 1 ORDER BY date DESC
My table format is as follows: (date is epoch time in milliseconds)
itemid price date ...
-----------------------------------
... most recent entries ...
1 15.50 1373022446000
2 5.00 1373022446000
3 20.50 1373022446000
... first entries older than X milliseconds ...
1 13.00 1372971693000
2 7.00 1372971693000
3 20.50 1372971693000
I would like to have a query that returned a result something similar to the following
itemid abs pct
----------------------------
1 +2.50 +19.2%
2 -2.00 -28.6%
3 0.00 0.00%
I'm not sure how to approach this. It seems as though it should be able to be done with a query, but I've been struggling to make any progress. I'm running sqlite3 on Play Framework 2.1.1.
Thanks!
You can do this with correlated subqueries and joins. The first problem is identifying the most recent price. tmax helps with this by getting the latest date for each item. This is then joined to the original data to get information such as the price.
Then a correlated subquery is used to get the previous price as least xxx milliseconds before that date. Note this is a relative time span, based on the original date. If you want an absolute timespan, then do date arithmetic on the current time.
select t.itemid, t.price - t.prevprice,
(t.price - t.prevprice) / t.price as change
from (select t.*,
(select t2.price
from yourtable t2
where t2.itemid = t.itemid and
t2.date < t.date - xxx
order by date
limit 1
) as prevprice
from yourtable t join
(select itemid, max(date) as maxdate
from yourtable t
group by itemid
) tmax
on tmax.itemid = t.itemid and tmax.maxdate = t.date
) t
If you have a large amount of data, you might really consider upgrading to a database other than SQLite. In any event, indexes can help improve performance.
If I read your question correctly, you want the diff and % between the first price and the last price for each itemnid.
I think this will help you:
select
t1.itemid,
(select top 1 price from table tout where tout.itemid = t1.itemid order by date desc) -
(select top 1 price from table tout where tout.itemid = t1.itemid order by date) as dif,
((select top 1 price from table tout where tout.itemid = t1.itemid order by date desc) -
(select top 1 price from table tout where tout.itemid = t1.itemid order by date)) /
(select top 1 price from table tout where tout.itemid = t1.itemid order by date desc) * 100 as percent
from
table t1
group by t1.itemid

query to display additional column based on aggregate value

I've been mulling on this problem for a couple of hours now with no luck, so I though people on SO might be able to help :)
I have a table with data regarding processing volumes at stores. The first three columns shown below can be queried from that table. What I'm trying to do is to add a 4th column that's basically a flag regarding if a store has processed >=$150, and if so, will display the corresponding date. The way this works is the first instance where the store has surpassed $150 is the date that gets displayed. Subsequent processing volumes don't count after the the first instance the activated date is hit. For example, for store 4, there's just one instance of the activated date.
store_id sales_volume date activated_date
----------------------------------------------------
2 5 03/14/2012
2 125 05/21/2012
2 30 11/01/2012 11/01/2012
3 100 02/06/2012
3 140 12/22/2012 12/22/2012
4 300 10/15/2012 10/15/2012
4 450 11/25/2012
5 100 12/03/2012
Any insights as to how to build out this fourth column? Thanks in advance!
The solution start by calculating the cumulative sales. Then, you want the activation date only when the cumulative sales first pass through the $150 level. This happens when adding the current sales amount pushes the cumulative amount over the threshold. The following case expression handles this.
select t.store_id, t.sales_volume, t.date,
(case when 150 > cumesales - t.sales_volume and 150 <= cumesales
then date
end) as ActivationDate
from (select t.*,
sum(sales_volume) over (partition by store_id order by date) as cumesales
from t
) t
If you have an older version of Postgres that does not support cumulative sum, you can get the cumulative sales with a subquery like:
(select sum(sales_volume) from t t2 where t2.store_id = t.store_id and t2.date <= t.date) as cumesales
Variant 1
You can LEFT JOIN to a table that calculates the first date surpassing the 150 $ limit per store:
SELECT t.*, b.activated_date
FROM tbl t
LEFT JOIN (
SELECT store_id, min(thedate) AS activated_date
FROM (
SELECT store_id, thedate
,sum(sales_volume) OVER (PARTITION BY store_id
ORDER BY thedate) AS running_sum
FROM tbl
) a
WHERE running_sum >= 150
GROUP BY 1
) b ON t.store_id = b.store_id AND t.thedate = b.activated_date
ORDER BY t.store_id, t.thedate;
The calculation of the the first day has to be done in two steps, since the window function accumulating the running sum has to be applied in a separate SELECT.
Variant 2
Another window function instead of the LEFT JOIN. May of may not be faster. Test with EXPLAIN ANALYZE.
SELECT *
,CASE WHEN running_sum >= 150 AND thedate = first_value(thedate)
OVER (PARTITION BY store_id, running_sum >= 150 ORDER BY thedate)
THEN thedate END AS activated_date
FROM (
SELECT *
,sum(sales_volume)
OVER (PARTITION BY store_id ORDER BY thedate) AS running_sum
FROM tbl
) b
ORDER BY store_id, thedate;
->sqlfiddle demonstrating both.