How to implement a reset when (Teradata) using ANSI SQL only? - sql

enter image description here
I need to write a query that count the number of times customers transactions exceed 250 Pounds. Adding cumulatively until the sum exceeds 250 then reset and start from the following row until it exceeds 250 and so on. This functionality can be carried out using Teradata keywords 'RESET WHEN' yet I am supposed to create a query that's only composed of ANSI SQL SYNTAX.
Can anyone help with that?
SUM(sales) OVER (
PARTITION BY region
ORDER BY day_of_calendar
RESET WHEN sales < /* preceding row */ SUM(sales) OVER (
PARTITION BY region
ORDER BY day_of_calendar
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)
ROWS UNBOUNDED PRECEDING
)
1: https://i.stack.imgur.com/lu4Jp.png This is a sample of the input of customer
enter image description here
And that's the output.
Every time the customer's total spent exceeds 250, I should be summing from 0 once again and find the day at which the customer exceeded 250 USD.

Without your table definitions and just a screenshot of a very limited dataset it is hard to test my answer on your data - so I'm showing it first on the dataset supplied in the match_recognize tutorial on live SQL and then with your columns:
SELECT
*
FROM
ticker MATCH_RECOGNIZE (
PARTITION BY symbol
ORDER BY tstamp
MEASURES
nvl(SUM(up.price),0) AS tot
ALL ROWS PER MATCH
PATTERN ( up* ) DEFINE
up AS SUM(up.price) - up.price <= 100
);
So on your table this would be something like
SELECT
*
FROM
your_table MATCH_RECOGNIZE (
PARTITION BY region
ORDER BY day_of_calendar
MEASURES
nvl(SUM(up.sales),0) AS tot
ALL ROWS PER MATCH
PATTERN ( up* ) DEFINE
up AS SUM(up.sales) - up.sales <= 250
);

Related

Identifying if a column is in descending order

I am using Microsoft SQL Server 2005 Management Studio. I am a bit new so I hope I am not breaking any rules. My data is 15 columns and almost a million rows, however I am just giving you a sample to get assistance on one area where I am stuck.
In the above example as you can see the column 'lastlevel' values are decreasing. Also you can see that data in the 'Last_read' column date range is from today to 14 days prior (it was ran yesterday hence April 27, also pls. disregard that for 1st customer date 2021/04/14 is missing, it is an anomaly).
Column 'Shipto' provides the customer number and each customer has max 14 rows of data.
Please disregard column 'current_reading' and rn
If look at 'lastlevel' again you will notice that the values are going down consistently, however on April 18th, it goes from 0.73 to 0.74, showing an increase of 0.01.
What I want to do is that whenever there is an increase at all, I want that whole customer's all 14 rows be removed from the output i.e. I only want to see customers that have the prefect descending data and no increases.
Can you help?
WITH
deltas AS
(
-- For each [Shipto]; deduct the preceding row's value and record it as the [delta]
-- Note, each [Shipto]'s first row's delta with therefor be NULL
SELECT
*,
lastlevel - LAG(lastlevel) OVER (PARTITION BY Shipto ORDER BY Last_Read, lastlevel DESC) AS delta
FROM
yourTable
),
max_deltas AS
(
-- Get the maximum of the deltas per [Shipto]
SELECT
*,
MAX(delta) OVER (PARTITION BY Shipto) AS max_delta
FROM
deltas
)
-- Return only rows where the delta never exceeds 0 (thus, never ascending over any timestep)
SELECT
*
FROM
max_deltas
WHERE
max_delta <= 0
I've ordered by Last_Read, lastlevel DESC such that if two readings are on the same date, it is assumed that the highest value should be considered to have happened first.

How to compare the value of one row with the upper row in one column of an ordered table?

I have a table in PostgreSQL that contains the GPS points from cell phones. It has an integer column that stores epoch (the number of seconds from 1960). I want to order the table based on time (epoch column), then, break the trips to sub trips when there is no GPS record for more than 2 minutes.
I did it with GeoPandas. However, it is too slow. I want to do it inside the PostgreSQL. How can I compare each row of the ordered table with the previous row (to see if the epoch has a difference of 2 minutes or more)?
In fact, I do not know how to compare each row with the upper row.
You can use lag():
select t.*
from (select t.*,
lag(timestamp_epoch) over (partition by trip order by timestamp_epoch) as last_timestamp_epoch
from t
) t
where last_timestamp_epoch < timestamp_epoch - 120
I want to order the table based on time (epoch column), then, break the trips to sub trips when there is no GPS record for more than 2 minutes.
After comparing to the previous (or next) row, with the window function lag() (or lead()), form groups based on the gaps to get sub trip numbers:
SELECT *, count(*) FILTER (WHERE step) OVER (PARTITION BY trip ORDER BY timestamp_epoch) AS sub_trip
FROM (
SELECT *
, (timestamp_epoch - lag(timestamp_epoch) OVER (PARTITION BY trip ORDER BY timestamp_epoch)) > 120 AS step
FROM tbl
) sub;
Further reading:
Select longest continuous sequence

How to split a list of number into ranges with a fixed interval with SQL?

Let's say I have a table like this
I want to calculate the frequency ( How many times that product exists in that price range ), in intervals of "50"
So eventually it will give me a table like
Interval for range will be lets pretend a fixed 50
We don't know highest and lowest price of these each products.
So I will run the query and it will give a table as shown above.
You can use arithmetic and aggregation:
select product, count(*) as frequency,
floor(price / 50)*50 as range_start, floor(price / 50)*50 + 50 as range_end
from t
group by product, floor(price / 50)
order by product, min(price)

analyze range and if true tell me

I want to see if the price of a stock has changed by 5% this week. I have data that captures the price everyday. I can get the rows from the last 7 days by doing the following:
select price from data where date(capture_timestamp)>date(current_timestamp)-7;
But then how do I analyze that and see if the price has increased or decreased 5%? Is it possible to do all this with one sql statement? I would like to be able to then insert any results of it into a new table but I just want to focus on it printing out in the shell first.
Thanks.
It seems odd to have only one stock in a table called data. What you need to do is bring the two rows together for last week's and today's values, as in the following query:
select d.price
from data d cross join
data dprev
where cast(d.capture_timestamp as date = date(current_timestamp) and
cast(dprev.capture_timestamp as date) )= cast(current_timestamp as date)-7 and
d.price > dprev.price * 1.05
If the data table contains the stock ticker, the cross join would be an equijoin.
You may be able to use query from the following subquery for whatever calculations you want to do. This is assuming one record per day. The 7 preceding rows is literal.
SELECT ticker, price, capture_ts
,MIN(price) OVER (PARTITION BY ticker ORDER BY capture_ts ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) AS min_prev_7_records
,MAX(price) OVER (PARTITION BY ticker ORDER BY capture_ts ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) AS max_prev_7_records
FROM data

Optimizing a Vertica SQL query to do running totals

I have a table S with time series data like this:
key day delta
For a given key, it's possible but unlikely that days will be missing.
I'd like to construct a cumulative column from the delta values (positive INTs), for the purposes of inserting this cumulative data into another table. This is what I've got so far:
SELECT key, day,
SUM(delta) OVER (PARTITION BY key ORDER BY day asc RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
delta
FROM S
In my SQL flavor, default window clause is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, but I left that in there to be explicit.
This query is really slow, like order of magnitude slower than the old broken query, which filled in 0s for the cumulative count. Any suggestions for other methods to generate the cumulative numbers?
I did look at the solutions here:
Running total by grouped records in table
The RDBMs I'm using is Vertica. Vertica SQL precludes the first subselect solution there, and its query planner predicts that the 2nd left outer join solution is about 100 times more costly than the analytic form I show above.
I think you're essentially there. You may just need to update the syntax a bit:
SELECT s_qty,
Sum(s_price)
OVER(
partition BY NULL
ORDER BY s_qty ASC rows UNBOUNDED PRECEDING ) "Cumulative Sum"
FROM sample_sales;
Output:
S_QTY | Cumulative Sum
------+----------------
1 | 1000
100 | 11000
150 | 26000
200 | 28000
250 | 53000
300 | 83000
2000 | 103000
(7 rows)
reference link:
https://dwgeek.com/vertica-cumulative-sum-average-and-example.html/
Sometimes it's faster to just use a correlated subquery:
SELECT
[key]
, [day]
, delta
, (SELECT SUM(delta) FROM S WHERE [key] < t1.[key]) AS DeltaSum
FROM S t1