calculating "Max Draw Down" in SQL - sql

edit: it's worth reviewing the comments section of the first answer to get a clearer idea of the problem.
edit: I'm using SQLServer 2005
something similar to this was posted before but I don't think enough information was given by the poster to truly explain what max draw down is. All my definitions of max draw down come from (the first two pages of) this paper:
http://www.stat.columbia.edu/~vecer/maxdrawdown3.pdf
effectively, you have a few terms defined mathematically:
Running Maximum, Mt
Mt = maxu in [0,t] (Su)
where St is the price of a Stock, S, at time, t.
Drawdown, Dt
Dt = Mt - St
Max Draw Down, MDDt
MDDt = maxu in [0,t] (Du)
so, effectively what needs to be determined is the local maximums and minimums from a set of hi and low prices for a given stock over a period of time.
I have a historical quote table with the following (relevant) columns:
stockid int
day date
hi int --this is in pennies
low int --also in pennies
so for a given date range, you'll see the same stockid every day for that date range.
EDIT:
hi and low are high for the day and low for each day.
once the local max's and min's are determined, you can pair every max with every min that comes after it and calculate the difference. From that set, the maximum difference would be the "Max Draw Down".
The hard part though, is finding those max's and min's.
edit: it should be noted:
max drawdown is defined as the value of the hypothetical loss if the stock is bought at it's highest buy point and sold at it's lows sell point. A stock can't be sold at a minval that came before a maxval. so, if the global minval comes before the global maxval, those two values do not provide enough information to determine the max-drawdown.

Brutally inefficient, but very simple version using a view is below:
WITH DDView
AS (SELECT pd_curr.StockID,
pd_curr.Date,
pd_curr.Low_Price AS CurrPrice,
pd_prev.High_Price AS PrevPrice,
pd_curr.Low_Price / pd_prev.High_Price - 1.0 AS DD
FROM PriceData pd_curr
INNER JOIN PriceData pd_prev
ON pd_curr.StockID = pd_prev.StockID
AND pd_curr.Date >= pd_prev.Date
AND pd_curr.Low_Price <= pd_prev.High_Price
AND pd_prev.Date >= '2001-12-31' -- #param: min_date of analyzed period
WHERE pd_curr.Date <= '2010-09-31' -- #param: max_date of analyzed period
)
SELECT dd.StockID,
MIN(COALESCE(dd.DD, 0)) AS MaxDrawDown
FROM DDView dd
GROUP BY dd.StockID
As usually you would perform the analysis on specific time period, it would make sense to wrap the query in a stored procedure with the parameters #StartDate, #EndDate and possibly #StockID. Again, this is quite inefficient by design - O(N^2), but if you have good indices and not huge amount of data, SQL Server will handle it pretty good.

Some things we need to consider in the problem domain:
Stocks have a range of prices every day, often viewed in candlestick charts
lets call the highest price of a day HI
lets call the lowest price of a day LOW
the problem is constrained by time, even if the time constraints are the IPO date and Delisting Dates
the maximum drawdown is the most you could possibly lose on a stock over that timeframe
assuming a LONG strategy: logically if we are able to determine all local maxes (MAXES) and all local mins (MINS) we could define a set of where we pair each MAX with each subsequent MIN and calculate the difference DIFFS
Sometimes the difference will result in a negative number, however that is not a drawdown
therefore, we need to select append 0 in the set of diffs and select the max
The problem lies in defining the MAXES and the MINS, with the function of the curve we could apply calculus, bummer we can't. Obviously
the maxes need to come from the HI and
the MINS need to come from the LOW
One way to solve this is to define a cursor and brute force it. Functional languages have nice toolsets for solving this as well.

For SQL Server and for one stock at a time, try this:
Create Procedure 'MDDCalc'(
#StartDate date,
#EndDate date,
#Stock int)
AS
DECLARE #MinVal Int
DECLARE #MaxVal Int
DECLARE #MaxDate date
SET #MaxVal = (
SELECT MAX(hi)
FROM Table
WHERE Stockid = #Stock
AND Day BETWEEN (#Startdate-1) AND (#EndDate+1))
SET #MaxDate=(
SELECT Min(Date)
FROM Table
WHERE Stockid = #Stock
AND hi = #MaxVal)
SET #MinVal = (
SELECT MIN(low)
FROM Table
WHERE Stockid = #Stock
AND Day BETWEEN (#MaxDate-1) AND (#EndDate+1))
SELECT (#MaxVal-#MinVal) AS 'MDD'

I have encounter this problem recently, My solution is like this:
let data: 3,5,7,3,-1,3,-8,-3,0,10
add the sum one by one, if the sum is great than 0, set it 0, else get the sum, the result would be like this
0,0,0,0,-1,0,-8,-11,-11,-1
The Maximum draw down is the lowest value in the data, -11.

Is this what you're after?
select StockID,max(drawdown) maxdrawdown
from (
select h.StockID,h.day highdate,l.day lowdate,h.hi - l.lo drawdown
from mdd h
inner join mdd l on h.StockID = l.StockID
and h.day<l.day) x
group by StockID;
It's a SQL based brute force approach. It compares every low price after today's hi price within the same stock and finds the greatest difference between the two prices. This will be the Maximum Draw Down.
It doesn't compare consider the same day as possible for maximum draw down as we don't have enough info in the table to determine if the Hi price happened before the Lo price on the day.

Here is a SQL Server 2005 user-defined function that should return the correct answer for a single stockid very efficiently
CREATE FUNCTION dbo.StockMaxDD(#StockID int, #day datetime) RETURNS int AS
BEGIN
Declare #MaxVal int; Set #MaxVal = 0;
Declare #MaxDD int; Set #MaxDD = 0;
SELECT TOP(99999)
#MaxDD = CASE WHEN #MaxDD < (#MaxVal-low) THEN (#MaxVal-low) ELSE #MaxDD END,
#MaxVal = CASE WHEN hi > #MaxVal THEN hi ELSE #MaxVal END
FROM StockHiLo
WHERE stockid = #Stockid
AND [day] <= #day
ORDER BY [day] ASC
RETURN #MaxDD;
END
This would not, however, be very efficient for doing a number of stockids at the same time. If you need to do many/all of the stockids at once, then there is a similar, but substantially more difficult approach that can do that very efficiently.

Related

(Forecasting) Calculating the balance in the future

My product team has asked if I could create a very crude forecasting data table for them to work with. I have a pretty good idea on a lot of the steps I need to take, but I am stuck on figuring how to to calculate the Inventory Quantity for tomorrow, the next day, etc.
In my database, I am able to see our current quantity on hand. I would call that starting balance (for today). I am then going to create an average usage field and that will be my estimated daily sales. I will then take the starting balance - estimated daily sales = ending balance. I can do that for today, my question is how do I roll that formula forward for the next 120 days
You can use a recursive CTE to generate number from 0 to 120 and then calculate the day and balance with them.
DECLARE #estimated_daily_sales integer = 2;
DECLARE #starting_balance integer = 12345;
WITH
cte AS
(
SELECT 0 i
UNION ALL
SELECT i + 1 i
FROM cte
WHERE i + 1 <= 120
)
SELECT dateadd(day, i, convert(date, getdate())) day,
#starting_balance - i * #estimated_daily_sales balance
FROM cte
OPTION (MAXRECURSION 120);
db<>fiddle

Finding the average time (in seconds) for multiple rows in SQL

I'm currently trying to find an average between two different dates for multiple rows. So for each created date subtract the assigned date. Then find an average across all datediff (looking for a result in seconds)
declare #offset int;
declare #st_date date;
declare #en_date date;
set #offset = (#BrowserTimezoneOffSet);
set #st_date = (#st_datein);
set #en_date = (#en_datein);
select
avg(subtract) as [AVG Assigned Time]
from
(select
DATEDIFF(ss, ign.createdDate, ign.assignDate) as subtract
from
(select
DATEADD(mi,#offset,s.CreatedDateTime) as createdDate
,DATEADD(mi,#offset,w.AuditHistoryDateTime) as assignDate
from ServiceReq s, Audit_ServiceReq w
WHERE
w.OwnerTeam_IsChanged = N'True' --owner was actually changed at some point
AND s.Subject = N'General Request'
AND w.AuditHistoryUser != N'InternalServices' --Doesn't include those done automagically by system
AND w.AuditHistoryEventType != 1 --Doesn't include creation
AND DATEADD(mi,#offset,s.CreatedDateTime) >= #st_date --greater than start date
AND DATEADD(mi,#offset,s.CreatedDateTime) <= #en_date --less than end date
AND s.CreatedByTeam in ('IT Helpdesk','Unassigned') --check the team
) as ign enter code here
) as dp
The above isn't returning accurate data I'm pretty new at this and I'm not sure what I'm doing wrong.
Any help would be appreciated
while you might be able to make this work, it's probably not the best way to approach it.
I'd either retrieve the rows and do the calculation in code (if it's guaranteed to be relatively small number of rows), or write a stored procedure that retrieves the rows with a cursor and iterates them to calculate what you want.
Either approach would be much more readable, and therefore much more maintainable than an extremely complex sql query.
I'd expect it to perform better, too. Over the years, I've consistently seen simple selects with cursor-based access for calculations outperform complex select statements.

Find closest date in SQL Server

I have a table dbo.X with DateTime column Y which may have hundreds of records.
My Stored Procedure has parameter #CurrentDate, I want to find out the date in the column Y in above table dbo.X which is less than and closest to #CurrentDate.
How to find it?
The where clause will match all rows with date less than #CurrentDate and, since they are ordered descendantly, the TOP 1 will be the closest date to the current date.
SELECT TOP 1 *
FROM x
WHERE x.date < #CurrentDate
ORDER BY x.date DESC
Use DateDiff and order your result by how many days or seconds are between that date and what the Input was
Something like this
select top 1 rowId, dateCol, datediff(second, #CurrentDate, dateCol) as SecondsBetweenDates
from myTable
where dateCol < #currentDate
order by datediff(second, #CurrentDate, dateCol)
I have a better solution for this problem i think.
I will show a few images to support and explain the final solution.
Background
In my solution I have a table of FX Rates. These represent market rates for different currencies. However, our service provider has had a problem with the rate feed and as such some rates have zero values. I want to fill the missing data with rates for that same currency that as closest in time to the missing rate. Basically I want to get the RateId for the nearest non zero rate which I will then substitute. (This is not shown here in my example.)
1) So to start off lets identify the missing rates information:
Query showing my missing rates i.e. have a rate value of zero
2) Next lets identify rates that are not missing.
Query showing rates that are not missing
3) This query is where the magic happens. I have made an assumption here which can be removed but was added to improve the efficiency/performance of the query. The assumption on line 26 is that I expect to find a substitute transaction on the same day as that of the missing / zero transaction.
The magic happens is line 23: The Row_Number function adds an auto number starting at 1 for the shortest time difference between the missing and non missing transaction. The next closest transaction has a rownum of 2 etc.
Please note that in line 25 I must join the currencies so that I do not mismatch the currency types. That is I don't want to substitute a AUD currency with CHF values. I want the closest matching currencies.
Combining the two data sets with a row_number to identify nearest transaction
4) Finally, lets get data where the RowNum is 1
The final query
The query full query is as follows;
; with cte_zero_rates as
(
Select *
from fxrates
where (spot_exp = 0 or spot_exp = 0)
),
cte_non_zero_rates as
(
Select *
from fxrates
where (spot_exp > 0 and spot_exp > 0)
)
,cte_Nearest_Transaction as
(
select z.FXRatesID as Zero_FXRatesID
,z.importDate as Zero_importDate
,z.currency as Zero_Currency
,nz.currency as NonZero_Currency
,nz.FXRatesID as NonZero_FXRatesID
,nz.spot_imp
,nz.importDate as NonZero_importDate
,DATEDIFF(ss, z.importDate, nz.importDate) as TimeDifferece
,ROW_NUMBER() Over(partition by z.FXRatesID order by abs(DATEDIFF(ss, z.importDate, nz.importDate)) asc) as RowNum
from cte_zero_rates z
left join cte_non_zero_rates nz on nz.currency = z.currency
and cast(nz.importDate as date) = cast(z.importDate as date)
--order by z.currency desc, z.importDate desc
)
select n.Zero_FXRatesID
,n.Zero_Currency
,n.Zero_importDate
,n.NonZero_importDate
,DATEDIFF(s, n.NonZero_importDate,n.Zero_importDate) as Delay_In_Seconds
,n.NonZero_Currency
,n.NonZero_FXRatesID
from cte_Nearest_Transaction n
where n.RowNum = 1
and n.NonZero_FXRatesID is not null
order by n.Zero_Currency, n.NonZero_importDate

Why doesn't this sum of percentages add up to 100%?

I have a series of calculation times in a DB2 SQL DB that are stored as float with a default value of 0.0.
The table being updated is as follows:
CREATE TABLE MY_CALC_DATA_TABLE
(
CALCDATE TIMESTAMP,
INDIV_CALC_DURATION_IN_S FLOAT WITH DEFAULT 0.0,
CALC_TIME_PERCENTAGE FLOAT WITH DEFAULT 0.0
)
Using a sproc. I am calculating the sum as follows:
CREATE OR REPLACE PROCEDURE MY_SCHEMA.MY_SPROC (IN P_DATE TIMESTAMP)
LANGUAGE SQL
NO EXTERNAL ACTION
BEGIN
DECLARE V_TOTAL_CALC_TIME_IN_S FLOAT DEFAULT 0.0;
-- other stuff setting up and joining data
-- Calculate the total time taken to perform the
-- individual calculations
SET V_TOTAL_CALC_TIME_IN_S =
(
SELECT
SUM(C.INDIV_CALC_DURATION_IN_S)
FROM
MY_SCHEMA.MY_CALC_DATA_TABLE C
WHERE
C.CALCDATE = P_DATE
)
-- Now calculate each individual calculation's percentage
-- of the toal time.
UPDATE
MY_SCHEMA.MY_CALC_DATA_TABLE C
SET
C.CALC_TIME_PERCENTAGE =
(C.INDIV_CALC_DURATION_IN_S / V_TOTAL_CALC_TIME_IN_S) * 100
WHERE
C.CALCDATE = P_DATE;
END#
Trouble is, when I do a sum of all the CALC_TIME_PERCENTAGE values for the specified CALC_DATE it is always less than 100% with the sum being values like 80% or 70% for different CALC_DATES.
We are talking between 35k and 55k calculations here with the maximum individual calculation's percentage of the total, as calculated above, being 11% and lots of calculations in the 0.00000N% range.
To calculate the total percentage I am using the simple query:
SELECT
SUM(C.CALC_TIME_PERCENTAGE)
FROM
MY_SCHEMA.MY_CALC_DATA_TABLE C
WHERE
C.CALCDATE = P_DATE;
Any suggestions?
Update: Rearranging the calc. as suggested fixed the problem. Thanks. BTW In DB2 FLOAT and DOUBLE are the same type. And now to read that suggested paper on floats.
If the field C.INDIV_CALC_DURATION_IN_S were Integer, I would assume it's a rounding error. Reading again, that is not the problem as the datatype is FLOAT.
You can still try using this. I wouldn't be surprised if this yielded (slighly) different results than the previous method:
SET
C.CALC_TIME_PERCENTAGE =
(C.INDIV_CALC_DURATION_IN_S * 100.0 / V_TOTAL_CALC_TIME_IN_S)
But you mention that there a lot of rows in a calculation for a certain date, so it may be a rounding error due to that. Try with DOUBLE datatype in both fields (or at least the CALC_TIME_PERCENTAGE field) and see if the difference from 100% gets smaller.
I'm not sure if DB2 has DECIMAL(x,y) datatype. It may be more appropriate in this case.
Another problem is how you find the sum of CALC_TIME_PERCENTAGE. I suppose you (and everyone else) would use the:
SELECT
P_DATE, SUM(CALC_TIME_PERCENTAGE)
FROM
MY_SCHEMA.MY_CALC_DATA_TABLE C
GROUP BY P_DATE
This way, you have no way to determine in what order the summation will be done. It may not be even possible to determine that but you can try:
SELECT
P_DATE, SUM(CALC_TIME_PERCENTAGE)
FROM
( SELECT
P_DATE, CALC_TIME_PERCENTAGE
FROM
MY_SCHEMA.MY_CALC_DATA_TABLE C
ORDER BY P_DATE
, CALC_TIME_PERCENTAGE ASC
) AS tmp
GROUP BY P_DATE
The optimizer may disregard the interior ORDER BY but it's worth a shot.
Another possibility for this big difference is that rows are deleted from the table between the UPDATE and the SHOW percent SUM operations.
You can test if that happens by running the calculations (without UPDATE) and summing up:
SELECT
P_DATE
, SUM( INDIV_CALC_DURATION_IN_S * 100.0 / T.TOTAL )
AS PERCENT_SUM
FROM
MY_SCHEMA.MY_CALC_DATA_TABLE C
, ( SELECT SUM(INDIV_CALC_DURATION_IN_S) AS TOTAL
FROM MY_SCHEMA.MY_CALC_DATA_TABLE
) AS TMP
GROUP BY P_DATE
Might be a rounding problem. Try C.INDIV_CALC_DURATION_IN_S * 100 / V_TOTAL_CALC_TIME_IN_S instead.
If C.INDIV_CALC_DURATION_IN_S is very small but you have a large number of rows (and thus V_TOTAL_CALC_TIME_IN_S becomes large in comparison) then
(C.INDIV_CALC_DURATION_IN_S / V_TOTAL_CALC_TIME_IN_S) * 100
is likely to lose precision, especially if you're using FLOATs.
If this is the case, then changing the calculation (as mentioned elsewhere) to
(C.INDIV_CALC_DURATION_IN_S * 100) / V_TOTAL_CALC_TIME_IN_S
should increase the total, although it may not get you all the way to 100%
If that's the case and a lot of the measurements are small fractions of a second, I'd consider looking beyond this procedure: could the times be recorded in, say, milli- or micro-seconds? Either would give you some headroom for additional significant digits.

Calculating different tariff-periods for a call in SQL Server

For a call-rating system, I'm trying to split a telephone call duration into sub-durations for different tariff-periods. The calls are stored in a SQL Server database and have a starttime and total duration. Rates are different for night (0000 - 0800), peak (0800 - 1900) and offpeak (1900-235959) periods.
For example:
A call starts at 18:50:00 and has a duration of 1000 seconds. This would make the call end at 19:06:40, making it 10 minutes / 600 seconds in the peak-tariff and 400 seconds in the off-peak tariff.
Obviously, a call can wrap over an unlimited number of periods (we do not enforce a maximum call duration). A call lasting > 24 h can wrap all 3 periods, starting in peak, going through off-peak, night and back into peak tariff.
Currently, we are calculating the different tariff-periods using recursion in VB. We calculate how much of the call goes in the same tariff-period the call starts in, change the starttime and duration of the call accordingly and repeat this process till the full duration of the call has been reach (peakDuration + offpeakDuration + nightDuration == callDuration).
Regarding this issue, I have 2 questions:
Is it possible to do this effectively in a SQL Server statement? (I can think of subqueries or lots of coding in stored procedures, but that would not generate any performance improvement)
Will SQL Server be able to do such calculations in a way more resource-effective than the current VB scripts are doing it?
It seems to me that this is an operation with two phases.
Determine which parts of the phone call use which rates at which time.
Sum the times in each of the rates.
Phase 1 is trickier than Phase 2. I've worked the example in IBM Informix Dynamic Server (IDS) because I don't have MS SQL Server. The ideas should translate easily enough. The INTO TEMP clause creates a temporary table with an appropriate schema; the table is private to the session and vanishes when the session ends (or you explicitly drop it). In IDS, you can also use an explicit CREATE TEMP TABLE statement and then INSERT INTO temp-table SELECT ... as a more verbose way of doing the same job as INTO TEMP.
As so often in SQL questions on SO, you've not provided us with a schema, so everyone has to invent a schema that might, or might not, match what you describe.
Let's assume your data is in two tables. The first table has the call log records, the basic information about the calls made, such as the phone making the call, the number called, the time when the call started and the duration of the call:
CREATE TABLE clr -- call log record
(
phone_id VARCHAR(24) NOT NULL, -- billing plan
called_number VARCHAR(24) NOT NULL, -- needed to validate call
start_time TIMESTAMP NOT NULL, -- date and time when call started
duration INTEGER NOT NULL -- duration of call in seconds
CHECK(duration > 0),
PRIMARY KEY(phone_id, start_time)
-- other complicated range-based constraints omitted!
-- foreign keys omitted
-- there would probably be an auto-generated number here too.
);
INSERT INTO clr(phone_id, called_number, start_time, duration)
VALUES('650-656-3180', '650-794-3714', '2009-02-26 15:17:19', 186234);
For convenience (mainly to save writing the addition multiple times), I want a copy of the clr table with the actual end time:
SELECT phone_id, called_number, start_time AS call_start, duration,
start_time + duration UNITS SECOND AS call_end
FROM clr
INTO TEMP clr_end;
The tariff data is stored in a simple table:
CREATE TABLE tariff
(
tariff_code CHAR(1) NOT NULL -- code for the tariff
CHECK(tariff_code IN ('P','N','O'))
PRIMARY KEY,
rate_start TIME NOT NULL, -- time when rate starts
rate_end TIME NOT NULL, -- time when rate ends
rate_charged DECIMAL(7,4) NOT NULL -- rate charged (cents per second)
);
INSERT INTO tariff(tariff_code, rate_start, rate_end, rate_charged)
VALUES('N', '00:00:00', '08:00:00', 0.9876);
INSERT INTO tariff(tariff_code, rate_start, rate_end, rate_charged)
VALUES('P', '08:00:00', '19:00:00', 2.3456);
INSERT INTO tariff(tariff_code, rate_start, rate_end, rate_charged)
VALUES('O', '19:00:00', '23:59:59', 1.2345);
I debated whether the tariff table should use TIME or INTERVAL values; in this context, the times are very similar to intervals relative to midnight, but intervals can be added to timestamps where times cannot. I stuck with TIME, but it made things messy.
The tricky part of this query is generating the relevant date and time ranges for each tariff without loops. In fact, I ended up using a loop embedded in a stored procedure to generate a list of integers. (I also used a technique that is specific to IBM Informix Dynamic Server, IDS, using the table ID numbers from the system catalog as a source of contiguous integers in the range 1..N, which works for numbers from 1 to 60 in version 11.50.)
CREATE PROCEDURE integers(lo INTEGER DEFAULT 0, hi INTEGER DEFAULT 0)
RETURNING INT AS number;
DEFINE i INTEGER;
FOR i = lo TO hi STEP 1
RETURN i WITH RESUME;
END FOR;
END PROCEDURE;
In the simple case (and the most common case), the call falls in a single-tariff period; the multi-period calls add the excitement.
Let's assume we can create a table expression that matches this schema and covers all the timestamp values we might need:
CREATE TEMP TABLE tariff_date_time
(
tariff_code CHAR(1) NOT NULL,
rate_start TIMESTAMP NOT NULL,
rate_end TIMESTAMP NOT NULL,
rate_charged DECIMAL(7,4) NOT NULL
);
Fortunately, you haven't mentioned weekend rates, so you charge the customers the same
rates at the weekend as during the week. However, the answer should adapt to such
situations if at all possible. If you were to get as complex as giving weekend rates on
public holidays, except that at Christmas or New Year, you charge peak rate instead of
weekend rate because of the high demand, then you would be best off storing the rates in a permanent tariff_date_time table.
The first step in populating tariff_date_time is to generate a list of dates which are relevant to the calls:
SELECT DISTINCT EXTEND(DATE(call_start) + number, YEAR TO SECOND) AS call_date
FROM clr_end,
TABLE(integers(0, (SELECT DATE(call_end) - DATE(call_start) FROM clr_end)))
AS date_list(number)
INTO TEMP call_dates;
The difference between the two date values is an integer number of days (in IDS).
The procedure integers generates values from 0 to the number of days covered by the call and stores the result in a temp table. For the more general case of multiple records, it might be better to calculate the minimum and maximum dates and generate the dates in between rather than generate dates multiple times and then eliminate them with the DISTINCT clause.
Now use a cartesian product of the tariff table with the call_dates table to generate the rate information for each day. This is where the tariff times would be neater as intervals.
SELECT r.tariff_code,
d.call_date + (r.rate_start - TIME '00:00:00') AS rate_start,
d.call_date + (r.rate_end - TIME '00:00:00') AS rate_end,
r.rate_charged
FROM call_dates AS d, tariff AS r
INTO TEMP tariff_date_time;
Now we need to match the call log record with the tariffs that apply. The condition is a standard way of dealing with overlaps - two time periods overlap if the end of the first is later than the start of the second and if the start of the first is before the end of the second:
SELECT tdt.*, clr_end.*
FROM tariff_date_time tdt, clr_end
WHERE tdt.rate_end > clr_end.call_start
AND tdt.rate_start < clr_end.call_end
INTO TEMP call_time_tariff;
Then we need to establish the start and end times for the rate. The start time for the rate is the later of the start time for the tariff and the start time of the call. The end time for the rate is the earlier of the end time for the tariff and the end time of the call:
SELECT phone_id, called_number, tariff_code, rate_charged,
call_start, duration,
CASE WHEN rate_start < call_start THEN call_start
ELSE rate_start END AS rate_start,
CASE WHEN rate_end >= call_end THEN call_end
ELSE rate_end END AS rate_end
FROM call_time_tariff
INTO TEMP call_time_tariff_times;
Finally, we need to sum the times spent at each tariff rate, and take that time (in seconds) and multiply by the rate charged. Since the result of SUM(rate_end - rate_start) is an INTERVAL, not a number, I had to invoke a conversion function to convert the INTERVAL into a DECIMAL number of seconds, and that (non-standard) function is iv_seconds:
SELECT phone_id, called_number, tariff_code, rate_charged,
call_start, duration,
SUM(rate_end - rate_start) AS tariff_time,
rate_charged * iv_seconds(SUM(rate_end - rate_start)) AS tariff_cost
FROM call_time_tariff_times
GROUP BY phone_id, called_number, tariff_code, rate_charged,
call_start, duration;
For the sample data, this yielded the data (where I'm not printing the phone number and called number for compactness):
N 0.9876 2009-02-26 15:17:19 186234 0 16:00:00 56885.760000000
O 1.2345 2009-02-26 15:17:19 186234 0 10:01:11 44529.649500000
P 2.3456 2009-02-26 15:17:19 186234 1 01:42:41 217111.081600000
That's a very expensive call, but the telco will be happy with that. You can poke at any of the intermediate results to see how the answer is derived. You can use fewer temporary tables at the cost of some clarity.
For a single call, this will not be much different than running the code in VB in the client. For a lot of calls, this has the potential to be more efficient. I'm far from convinced that recursion is necessary in VB - straight iteration should be sufficient.
kar_vasile(id,vid,datein,timein,timeout,bikari,tozihat)
{
--- the bikari field is unemployment time you can delete any where
select
id,
vid,
datein,
timein,
timeout,
bikari,
hourwork =
case when
timein <= timeout
then
SUM
(abs(DATEDIFF(mi, timein, timeout)) - bikari)/60 --
calculate Hour
else
SUM(abs(DATEDIFF(mi, timein, '23:59:00:00') + DATEDIFF(mi, '00:00:00', timeout) + 1) - bikari)/60 --
calculate
minute
end
,
minwork =
case when
timein <= timeout
then
SUM
(abs(DATEDIFF(MI, timein, timeout)) - bikari)%60 --
calclate Hour
starttime is later
than endtime
else
SUM(abs(DATEDIFF(mi, timein, '23:59:00:00') + DATEDIFF(mi, '00:00:00', timeout) + 1) - bikari)%60--
calculate minute
starttime is later
than
endtime
end, tozihat
from kar_vasile
group
by id, vid, datein, timein, timeout, tozihat, bikari
}
Effectively in T-SQL? I suspect not, with the schema as described at present.
It might be possible, however, if your rate table stores the three tariffs for each date. There is at least one reason why you might do this, apart from the problem at hand: it's likely at some point that rates for one period or another might change and you may need to have the historic rates available.
So say we have these tables:
CREATE TABLE rates (
from_date_time DATETIME
, to_date_time DATETIME
, rate MONEY
)
CREATE TABLE calls (
id INT
, started DATETIME
, ended DATETIME
)
I think there are three cases to consider (may be more, I'm making this up as I go):
a call occurs entirely within one
rate period
a call starts in one
rate period (a) and ends in the next (b)
a call spans at least one complete
rate period
Assuming rate is per second, I think you might produce something like the following (completely untested) query
SELECT id, DATEDIFF(ss, started, ended) * rate /* case 1 */
FROM rates JOIN calls ON started > from_date_time AND ended < to_date_time
UNION
SELECT id, DATEDIFF(ss, started, to_date_time) * rate /* case 2a and the start of case 3 */
FROM rates JOIN calls ON started > from_date_time AND ended > to_date_time
UNION
SELECT id, DATEDIFF(ss, from_date_time, ended) * rate /* case 2b and the last part of case 3 */
FROM rates JOIN calls ON started < from_date_time AND ended < to_date_time
UNION
SELECT id, DATEDIFF(ss, from_date_time, to_date_time) * rate /* case 3 for entire rate periods, should pick up all complete periods */
FROM rates JOIN calls ON started < from_date_time AND ended > to_date_time
You could apply a SUM..GROUP BY over that in SQL or handle it in your code. Alternatively, with carefully-constructed logic, you could probably merge the UNIONed parts into a single WHERE clause with lots of ANDs and ORs. I thought the UNION showed the intent rather more clearly.
HTH & HIW (Hope It Works...)
This is a thread about your problem we had over at sqlteam.com. take a look because it includes some pretty slick solutions.
Following on from Mike Woodhouse's answer, this may work for you:
SELECT id, SUM(DATEDIFF(ss, started, ended) * rate)
FROM rates
JOIN calls ON
CASE WHEN started < from_date_time
THEN DATEADD(ss, 1, from_date_time)
ELSE started > from_date_time
AND
CASE WHEN ended > to_date_time
THEN DATEADD(ss, -1, to_date_time)
ELSE ended END
< ended
GROUP BY id
An actual schema for the relevant tables in your database would have been very helpful. I'll take my best guesses. I've assumed that the Rates table has start_time and end_time as the number of minutes past midnight.
Using a calendar table (a VERY useful table to have in most databases):
SELECT
C.id,
R.rate,
SUM(DATEDIFF(ss,
CASE
WHEN C.start_time < R.rate_start_time THEN R.rate_start_time
ELSE C.start_time
END,
CASE
WHEN C.end_time > R.rate_end_time THEN R.rate_end_time
ELSE C.end_time
END)) AS
FROM
Calls C
INNER JOIN
(
SELECT
DATEADD(mi, Rates.start_time, CAL.calendar_date) AS rate_start_time,
DATEADD(mi, Rates.end_time, CAL.calendar_date) AS rate_end_time,
Rates.rate
FROM
Calendar CAL
INNER JOIN Rates ON
1 = 1
WHERE
CAL.calendar_date >= DATEADD(dy, -1, C.start_time) AND
CAL.calendar_date <= C.start_time
) AS R ON
R.rate_start_time < C.end_time AND
R.rate_end_time > C.start_time
GROUP BY
C.id,
R.rate
I just came up with this as I was typing, so it's untested and you will very likely need to tweak it, but hopefully you can see the general idea.
I also just realized that you use a start_time and a duration for your calls. You can just replace C.end_time wherever you see it with DATEADD(ss, C.start_time, C.duration) assuming that the duration is in seconds.
This should perform pretty quickly in any decent RDBMS assuming proper indexes, etc.
Provided that you calls last less than 100 days:
WITH generate_range(item) AS
(
SELECT 0
UNION ALL
SELECT item + 1
FROM generate_range
WHERE item < 100
)
SELECT tday, id, span
FROM (
SELECT tday, id,
DATEDIFF(minute,
CASE WHEN tbegin < clbegin THEN clbegin ELSE tbegin END,
CASE WHEN tend < clend THEN tend ELSE clend END
) AS span
FROM (
SELECT DATEADD(day, item, DATEDIFF(day, 0, clbegin)) AS tday,
ti.id,
DATEADD(minute, rangestart, DATEADD(day, item, DATEDIFF(day, 0, clbegin))) AS tbegin,
DATEADD(minute, rangeend, DATEADD(day, item, DATEDIFF(day, 0, clbegin))) AS tend
FROM calls, generate_range, tariff ti
WHERE DATEADD(day, 1, DATEDIFF(day, 0, clend)) > DATEADD(day, item, DATEDIFF(day, 0, clbegin))
) t1
) t2
WHERE span > 0
I'm assuming you keep your tariffs ranges in minutes from midnight and count lengths in minutes too.
The big problem with performing this kind of calculation at the database level is that it takes resource away from your database while it's going on, both in terms of CPU and availability of rows and tables via locking. If you were calculating 1,000,000 tariffs as part of a batch operation, then that might run on the database for a long time and during that time you'd be unable to use the database for anything else.
If you have the resource, retrieve all the data you need with one transaction and do all the logic calculations outside the database, in a language of your choice. Then insert all the results. Databases are for storing and retrieving data, and any business logic they perform should be kept to an absolute bare minimum at all times. Whilst brilliant at some things, SQL isn't the best language for date or string manipulation work.
I suspect you're already on the right lines with your VBA work, and without knowing more it certainly feels like a recursive, or at least an iterative, problem to me. When done correctly recursion can be a powerful and elegant solution to a problem. Tying up the resources of your database very rarely is.