How to add a value to future values? - sql

So I am quite new to SQL so please be gentle.
I am using Oracle Apex to generate a bar chart showing the amount of storage used in a particular database over the last 12 months. In this chart I am trying to add an additional feature where the chart will show (via a line) how much storage will likely be used on a fortnightly basis over the course of the next three months.
I have created a PL/SQL function (GET_PREDICTED_VALUE) which generates the anticipated value I require, but I am having trouble using that value to show the predicted trend. If the generated anticipated value is positive, that’s how much more storage is needed. If the anticipated value is negative, that’s how much less storage is needed (the arguments sent into this function are how many days ago do I want to draw the ‘storage used’ values. 30 will be 30 days ago, 150 is 150 days ago etc).
What I would like to do is use this anticipated value and add it to the last ‘current storage used’ value recorded in the database. I would like the anticipated value to be continually added so that a future trend can be observed.
For example, if the last ‘current storage used’ value saved in the database is 50, and the anticipated value is 3, then the value for the immediate future date should be 53. And the next future value should be 56, and the next should be 59, and so on up to three months ahead.
This is the code I have so far, and it works insofar that the chart I need does get generated and the trend line runs across the previous year and then the future three months. But for the future three months, I am only able to get the trend line to represent the anticipated value, not the value continually added E.g again, if the anticipated value is 3, my trend line shows 3 across all future dates.
WITH src AS
(
SELECT GET_PREDICTED_VALUE(1, 30, 60, 90, 120, 150, 180) AS predictedValue FROM DUAL
),
mydays as
(
select level mylevel,
sysdate + level as futureDate
from dual
connect by level <= 90
),
future_values as
(
SELECT MAX(src.predictedValue) AS "ASM Used",
TO_CHAR(TRUNC(futureDate,'IW') ,'DD/MM/YY') AS "Capture Date",
TRUNC(futureDate,'IW') AS capture_date_order
FROM mydays
CROSS JOIN src
GROUP BY TRUNC(futureDate,'IW')
ORDER BY TRUNC(futureDate,'IW')
),
SRC_back AS
(
SELECT TRUNC(capture_date,'IW') AS week,
database_name,
MAX((os_usable_total_mb-os_usable_free_mb)) AS ASMUsed
FROM tablespace_used
WHERE capture_date >= (SYSDATE - 360)
GROUP BY TRUNC(capture_date,'IW'), database_name, tablespace_name
),
back_values as
(
SELECT /*+ parallel(a, 8) */
TO_CHAR(week,'DD/MM/YY') AS "Capture Date",
week AS capture_date_order,
ROUND( MAX(ASMUsed/1024/1024 ) ) AS "ASM Used"
FROM src_back a
WHERE week >= (SYSDATE - 360)
GROUP BY TO_CHAR(week,'DD/MM/YY'), week
ORDER BY 2
)
select "Capture Date", capture_date_order, "ASM Used" from back_values
union
select "Capture Date", capture_date_order, "ASM Used" from future_values
order by 2
I hope my explanation is clear, and if anyone could let me know what I have to do to get my trend line running the way I need?
(This is an image of how my chart looks so far):

Related

Find the nearest overlap between given time series

I'm building a scheduling system where I store an initial appointment and how often it repeats. My table looks something like this:
CREATE TABLE (
id serial primary key,
initial_timestamp timestamp not null,
recurring interval
);
id initial_timestamp recurring
27 2020-06-02 3 weeks
24 2020-06-03 10 days
Assuming I can handle the time component, and that the only intervals we'll run across are days and weeks, how can I find the when those two appointments will overlap? For example, the previous example will overlap on June 23rd. It's 3 weeks from June 2nd and 20 days from June 3rd, so the first appointment will repeat once on that day and the second appointment will repeat on the 13th and then the 23rd.
In my program, I have another date, say June 7th with a recurring interval of 12 days. What query can I use to find the time it will take for a recurring appointment starting on June 7th to overlap with every existing recurring appointment? So for example, this appointment will repeat on June 19, July 1, and July 13. Appointment #24 from the table above will repeat on June 13, June 23, July 3, and July 13, if my math is right. I'd like my query comparing this appointment to appointment #24 to return, first of all, July 13th, then also how long it would take to repeat again, which I assume would be like finding the least common multiple of the two intervals, in this case, 60 days (LCM of 12 and 10). So I could expect it to repeat again on July 13 + 60 days = Sept 11.
I tried using generate_series, but since I don't know the size of the intervals, the series would have to continue infinitely, right? It's probably not the best choice here. I assume the answer would have more to do with the math of multiplying intervals somehow.
Note that recurring can be null, so I'd assume there has to be something like WHERE recurring IS NOT NULL in there somewhere. Another thing to note: no initial appointments overlap. I've already guarded against that. The search term doesn't overlap with any of the appointment's initial times either.
If it helps at all, I'm using PHP 5.3 to send queries to Postgres 9.4 (I know, it's an ancient setup). I'd prefer to do most of this in SQL just because most of the other logic is in SQL right now, so I can just run the query and start manipulating the results with PHP.
So in summary, if my math is right, what Postgres query should I use with the table above to compare a given date and interval with every date and interval pair from the table to find the next date those two overlap and how far apart each overlap instance would be?
This was hard.
WITH RECURSIVE moving_target(initial_timestamp, recurring) AS (
VALUES (timestamp '2020-06-07', interval '12 days') -- search term
)
, x AS ( -- advance to the closest day before or at moving target
SELECT t.id
, t_date + ((m_date - t_date) / t_step) * t_step AS t_date
, t_step
, m.*
FROM ( -- normalize table data
SELECT id
, initial_timestamp::date AS t_date
, EXTRACT ('days' FROM recurring)::int AS t_step
FROM tbl
WHERE recurring IS NOT NULL -- exclude!
) t
CROSS JOIN ( -- normalize input
SELECT initial_timestamp::date AS m_date
, EXTRACT ('days' FROM recurring)::int AS m_step
FROM moving_target
) m
)
, rcte AS ( -- recursive CTE
SELECT id, t_date, t_step, m_date, m_step
, ARRAY[m_date - t_date] AS gaps -- keep track of gaps
, CASE
WHEN t_date = m_date THEN true -- found match
WHEN t_step % m_step = 0 THEN false -- can never match
WHEN (m_date - t_date) % 2 = 1 -- odd gap ...
AND t_step % 2 = 0 -- ... but even steps
AND m_step % 2 = 0 THEN false -- can never match
-- WHEN <stop conditions?> THEN false -- hard to determine!
-- ELSE null -- keep searching
END AS match
FROM x
UNION ALL
SELECT id, t_date, t_step, m_date, m_step
, gaps || m_date - t_date
, CASE
WHEN t_date = m_date THEN true
WHEN (m_date - t_date) = ANY (gaps) THEN false -- gap repeated!
-- ELSE null -- keep searching
END AS match
FROM (
SELECT id
, t_date + (((m_date + m_step) - t_date) / t_step) * t_step AS t_date
, t_step
, m_date + m_step AS m_date -- + 1 step
, m_step
, gaps
FROM rcte
WHERE match IS NULL
) sub
)
SELECT id, t.initial_timestamp, t.recurring
, CASE WHEN r.match THEN r.t_date END AS match_date
FROM rcte r
JOIN tbl t USING (id)
WHERE r.match IS NOT NULL;
db<>fiddle here - with more test rows
There may be potential to improve further. The core problem is in the realm of
prime factorization. As it seems reasonable to expect fairly small intervals, I solved it by testing for cycles: If, while incrementally stepping forward, a gap between dates is detected that we have seen before, and dates didn't overlap yet, they will never overlap and we can stop. This loops at most GREATEST(m_step, t_step) times (the number of days in the bigger interval), so it shouldn't scale terribly.
I identified some basic mathematical stop conditions to avoid looping in hopeless cases a priori. There may be more ...
Explaining everything that's going on here is more work than devising the query. I added comments that should explain basics ...
Then again, while intervals are small, a "brute force" approach based on generate_series() may still be faster.

Identifying premature expiration

The dataset I have is a bit tricky. It’s a rolling calendar for a period of 24 months, and the data is published only once a month.
The relevant data points are as follows:
• CaseNumber (int)
• Start_date (date)
• Reporting_month (date)
• Months_old (int)
The above ‘CaseNumber’ has the ‘potential’ to appear in a ‘Reporting_month’ as many as 24 times (0-23). However each ‘CaseNumber’ will only appear one time in each ‘Reporting_month’.
So if you list any # of months in chronological order (Jan, Feb, Mar, Apr, etc.) a single ‘CaseNumber’ is will show up in each one of those ‘Reporting_months’ as long as the ‘CaseNumber’ is < 23 ‘Months_old’.
However, once a ‘CaseNumber’ = 24 ‘Months_old’ it will no longer report in this data set. So the oldest any particular ‘CaseNumber’ will ever be in this reporting cycle is 23 ‘Months_old’. Any older, than it will not appear on this report.
What I’m interested doing is tracking these ‘CaseNumbers’ to see if any are dropping off of this report prematurely. So in doing so I need to be able to compare the current ‘Reporting_month’ to the previous ‘Reporting_month’ to determine if any of the ‘CaseNumbers’ prematurely dropped off.
Example:
Case # Previous Current
Months_old Months_old Status
1234 22 23 Correct age
5678 23 NULL Dropped due to age
9101 18 NULL Premature drop
only means i've been able to achieve this is via a VLOOKUP formula in excel done manually. I'd like to get away from having to complete this manually.
SELECT
a.[CaseNumber]
,CONVERT(DATE,MAX(a.[Month]),111) 'Month'
,CASE WHEN m2.[CaseNumber] IS NOT NULL
AND m1.[CaseNumber] IS NULL
THEN 'Yes'
ELSE 'No'
END as 'New Default'
FROM
[dbo].['v2-2yrTotalDefault$'] a
LEFT OUTER JOIN (
SELECT DISTINCT
[CaseNumber]
FROM
[dbo].['v2-2yrTotalDefault$']
WHERE
LEFT(CONVERT(varchar,[Month],112),6) = '201902') m1
ON m1.CaseNumber = a.CaseNumber --most current month
LEFT OUTER JOIN (
SELECT DISTINCT
[CaseNumber]
FROM
[dbo].['v2-2yrTotalDefault$']
WHERE
LEFT(CONVERT(varchar,[Month],112),6) = '201903') m2
ON m2.CaseNumber = a.CaseNumber --previous month
WHERE
a.[Month] > '12/01/2018'
GROUP BY
a.[CaseNumber]
ORDER BY
a.[CaseNumber]
/the continually errors out due to the following error:
Msg 8120, Level 16, State 1, Line 8
Column 'm2.CaseNumber' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Msg 8120, Level 16, State 1, Line 9
Column 'm1.CaseNumber' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause./
additionally with the above i don't want to have to hard code the months in the SELECT statement. I'd like to be able to control what month i'm looking to view in the WHERE clause.
in the end i'd like the results to return to columns, one reflecting previous months 'age' in months, and the second showing current month 'age'. if a CaseNumber dropped off prematurely, i'd like the current month to say 'premature_expiration'.

Find two local averages within one SQL Server data set

In the plant at our company there is a physical process that has a two-stage start and a two-stage finish. As a widget starts to enter the process a new record is created containing the widget ID and a timestamp (DateTimeCreated) and once the widget fully enters the process another timestamp is logged in a different field for the same record (DateTimeUpdated). The interval is a matter of minutes.
Similarly, as a widget starts to exit the process another record is created containing the widget ID and the DateTimeCreated, with the DateTimeUpdated being populated when the widget has fully exited the process. In the current table design an "exiting" record is indistinguishable from an "entering" record (although a given widget ID occurs only either once or twice so a View could utilise this fact to make the distinction, but let's ignore that for now).
The overall time a widget is in the process is several days but that's not really of importance to the discussion. What is important is that the interval when exiting the process is always longer than when entering. So a very simplified, imaginary set of sorted interval values might look like this:
1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 6, 7, 7, 7, 7, 8, 8, 8, 8, 10, 10, 10
You can see there is a peak in the occurrences of intervals around the 3-minute-mark (the "enters") and another peak around the 7/8-minute-mark (the "exits"). I've also excluded intervals of 5 minutes to demonstrate that enter-intervals and exit-intervals can be considered mutually exclusive.
We want to monitor the performance of each stage in the process daily by using a query to determine the local averages of the entry and exit data point clusters. So conceptually the two data sets could be split either side of an overall average (in this case 5.375) and then an average calculated for the values below the split (2.75) and another average above the split (8). Using the data above (in a random distribution) the averages are depicted as the dotted lines in the chart below.
My current approach is to use two Common Table Expressions followed by a final three-table-join query. It seems okay, but I can't help feeling it could be better. Would anybody like to offer an alternative approach or other observations?
WITH cte_Raw AS
(
SELECT
DATEDIFF(minute, DateTimeCreated, DateTimeUpdated) AS [Interval]
FROM
MyTable
WHERE
DateTimeCreated > CAST(CAST(GETDATE() AS date) AS datetime) -- Today
)
, cte_Midpoint AS
(
SELECT
AVG(Interval) AS Interval
FROM
cte_Raw
)
SELECT
AVG([Entry].Interval) AS AverageEntryInterval
, AVG([Exit].Interval) AS AverageExitInterval
FROM
cte_Raw AS [Entry]
INNER JOIN
cte_Midpoint
ON
[Entry].Interval < cte_Midpoint.Interval
INNER JOIN
cte_Raw AS [Exit]
ON
[Exit].Interval > cte_Midpoint.Interval
I don't think your query produces accurate results. Your two JOINs are producing a proliferation of rows, which throw the averages off. They might look correct (because one is less than the other), but it you did counts, you would see that the counts in your query have little to do with the sample data.
If you are just looking for the average of values that are less than the overall average and greater than the overall average, then you an use window functions:
WITH t AS (
SELECT t.*, v.[Interval],
AVG(v.[Interval]) OVER () as avg_interval
FROM MyTable t CROSS JOIN
(VALUES (DATEDIFF(minute, DateTimeCreated, DateTimeUpdated))
) v(Interval)
WHERE DateTimeCreated > CAST(CAST(GETDATE() AS date) AS datetime)
)
SELECT AVG(CASE WHEN t.[Interval] < t.avg_interval THEN t.[Interval] END) AS AverageEntryInterval,
AVG(CASE WHEN t.[Interval] > t.avg_interval THEN t.[Interval] END) AS AverageExitInterval
FROM t;
I decided to post my own answer as at the time of writing neither of the two proposed answers will run. I have however removed the JOIN statements and used the CASE statement approach proposed by Gordon.
I've also multiplied the DATEDIFF result by 1.0 to prevent rounding of results from the AVG function.
WITH cte_Raw AS
(
SELECT
1.0 * DATEDIFF(minute, DateTimeCreated, DateTimeUpdated) AS [Interval]
FROM
MyTable
WHERE
DateTimeCreated > CAST(CAST(GETDATE() AS date) AS datetime) -- Today
)
, cte_Midpoint AS
(
SELECT
AVG(Interval) AS Interval
FROM
cte_Raw
)
SELECT AVG(CASE WHEN cte_Raw.Interval < cte_Midpoint.Interval THEN cte_Raw.[Interval] END) AS AverageEntryInterval,
AVG(CASE WHEN cte_Raw.Interval > cte_Midpoint.Interval THEN cte_Raw.[Interval] END) AS AverageExitInterval
FROM cte_Raw CROSS JOIN cte_Midpoint
This solution does not cater for the theoretical pitfall indicated by Vladimir of uneven dispersions of Entry vs Exit intervals, as in practice we can be confident this does not occur.

Oracle - Count the same value used on consecutive days

Date jm Text
-------- ---- ----
6/3/2015 ne Good
6/4/2015 ne Good
6/5/2015 ne Same
6/8/2015 ne Same
I want to count how often the "same" value occurs in a set of consecutive days.
I dont want to count the value for the whole database. Now on the current date it is 2 (above example).
It is very important for me that "Same" never occurs...
The query has to ignore the weekend (6 and 7 june).
Date jm Text
-------- ---- ----
6/3/2015 ne Same
6/4/2015 ne Same
6/5/2015 ne Good
6/8/2015 ne Good
In this example the count is zero
Okay, I'm starting to get the picture, although at first I thought you wanted to count by jm, and now it seems you want to count by Text = 'Same'. Anyway, that's what this query should do. It gets the row for the current date. Is connects all previous rows and counts them. Also, it shows whether the current text (and that of the connected rows).
So the query will return one row (if there is one for today), which will show the date, jm and Text of the current date, the number of consecutive days for which the Text has been the same (just in case you want to know how many days it is 'Good'), and the number of days (either 0 or the same as the other count) for which the Text has been 'Same'.
I hope this query is right, or at least it gives you an idea of how to solve the problem using CONNECT BY. I should mention I based the 'Friday-detection' on this question.
Also, I don't have Oracle at hand, so please forgive me for any minor syntax errors.
WITH
VW_SAMESTATUSES AS
( SELECT t.*
FROM YourTable t
START WITH -- Start with the row for today
t.Date = trunc(sysdate)
CONNECT BY -- Connect to previous row that have a lower date.
-- Note that PRIOR refers to the prior record, which is
-- actually the NEXT day. :)
t.Date = PRIOR t.Date +
CASE MOD(TO_CHAR(t.Date, 'J'), 7) + 1
WHEN 5 THEN 3 -- Friday, so add 3
ELSE 1 -- Other days, so add one
END
-- And the Text also has to match to the one of the next day.
AND t.Text = PRIOR t.Text)
SELECT s.Date,
s.jm,
MAX(Text) AS CurrentText, -- Not really MAX, they are actually all the same
COUNT(*) AS ConsecutiveDays,
COUNT(CASE WHEN Text = 'Same' THEN 1 END) as SameCount
FROM VW_SAMESTATUSES s
GROUP BY s.Date,
s.jm
This recursive query (available from Oracle version 11g) might be useful:
with s(tcode, tdate) as (
select tcode, tdate from test where tdate = date '2015-06-08'
union all
select t.tcode, t.tdate from test t, s
where s.tcode = t.tcode
and t.tdate = s.tdate - decode(s.tdate-trunc(s.tdate, 'iw'), 0, 3, 1) )
select count(1) cnt from s
SQLFiddle
I prepared sample data according to your original question, without further edits, you can see them in attached SQLFiddle. Additional conditions for column 'Text'
are very simple, just add something like ... and Text ='Same' in where clauses.
In current version query counts number of previous days starting from given date (change it in line 2) where dates are consecutive (excluding weekend days) and values in column tcode is the same for all days.
Part: decode(s.tdate-trunc(s.tdate, 'iw'), 0, 3, 1) is for substracting days depending if it's Monday or other day, and should work independently from NLS settings.

Correct SQL Statement returns correct row that represents front month expiry contract

I have a SQL server 2008 R2 database of trade records for several equity options, each at one minute intervals, and each minute contains records for several expiry. e.g.,
Symbol, TradeDate, Expiry, Open, High, Low, Close
AMZN, 4/01/2009 9:31:00, 4/17/2009, 8, 10, 9, 8.5
AMZN, 4/01/2009 9:31:00, 5/17/2009, 10, 11, 10, 11
AMZN, 4/01/2009 9:31:00, 6/18/2009, 12,13,12,12
GOOG, 4/01/2009 9:31:00, 4/17/2009, 8, 9, 7, 7.5
AMZN, 4/01/2009 9:32:00, 4/17/2009, 8.2, 8.9, 8.3, 8.5
AMZN, 4/01/2009 9:32:00, 5/16/2009, 3, 4, 4, 4
...
AMZN, 4/20/2009 9:31:00, 5/16/2009, 8.5, 9, 8.75, 8.75
AMZN, 4/20/2009 9:31:00, 6/18/2009, 9, 10, 9, 9.2
In options there is always a notion of the front month contract. For this problem, define the front month contract to be: If there are TradeDate entries less than the expiry date for that contract, that is the front month. Else, the front month is the next months contract. So for example, in the data above, on 4/01/2009, the AMZN front month is the contract that expires on 4/17/2009. However, when we move to TradeDate 4/20/2009, the front month is the 5/16/2009 contract since the 4/17/2007 contract expired over the weekend.
What is the SQL statement that would always return all the correct rows giving the "front month contract" based on what the TradeDate is?
From what you have described. The following query should do it. A self join :
SELECT T1.Symbol, T1.TradeDate, T1.Expiry, MIN(T2.expiry) AS FrontMonrhContract, T1.Open, T1.High, T1.Low, T1.Close FROM <TABLE> T1, <TABLE> T2 WHERE T1.TradeDate <= T2.Expiry GROUP BY T1.Symbol , T1.TradeDate, T1.Expiry
But it will not work if there is no entry for a trade of the FrontMonth Contract.
What i would feel better is that you either hand input a list of Expiries or compute them based on some rule, like last Friday of the month or something like that (if there is a rule). So that you do not risk miscalculating the front month if there is no trade for the frontMonth contract.
Better still do it in your application instead of SQL, as SQL is not meant for such work. In your application it would be a simple comparison with a list of expiries.
I have reduced execution time of a charting function which worked on data from a sqlite database by over 90% by doing such computations in the application itself instead of SQL.
Update:
Try the following query. It assumes the table name to be TRADES.
SELECT
T1.Symbol,
T1.TradeDate,
T1.Expiry,
MIN(T2.expiry) AS 'FrontMonrhContract' ,
MIN(T1.[Open]) AS 'Open',
MIN(T1.[High]) AS 'High',
MIN(T1.[Low]) AS 'Low',
MIN(T1.[Close]) AS 'Close'
FROM
TRADES T1, TRADES T2
WHERE T1.TradeDate <= T2.Expiry AND T1.Symbol = T2.Symbol
GROUP BY T1.Symbol , T1.TradeDate, T1.Expiry
I just built a sample table with the data you provided in the question and this query works as expected on that data set. For note I have SQL Server 2005
Update 2:
To optimize the execution of the query, try adding an Index with the three GROUP BY columns Symbol, TradeDate, Expiry in that order.
I created a query execution plan and over 60% time is for resolving the GROUP BY, and after adding this index in my sample db it was completely gone.