Let's say I have two tables in my oracle database
Table A : stDate, endDate, salary
For example:
03/02/2010 28/02/2010 2000
05/03/2012 29/03/2012 2500
Table B : DateOfActivation, rate
For example:
01/01/2010 1.023
01/11/2011 1.063
01/01/2012 1.075
I would like to have a SQL query displaying the sum of salary of table A with each salary multiplied by the rate of table B depending on the activation date.
Here, for the first salary the good rate is the first one (1.023) because the second rate has a date of activation that is later than stDate and endDate interval.
For the second salary, the third rate is applied because activation date of the rate was before the interval of dates of the second salary.
so the sum is this one : 2000 * 1.023 + 2500 * 1.075 = 4733.5
I hope I am clear
Thanks
Assuming the rate must be active before the beginning of the interval (i.e. DateOfActivation < stDate), you could do something like this (see fiddle):
SELECT SUM(salary*
(SELECT rate from TableB WHERE DateOfActivation=
(SELECT MAX(DateOfActivation) FROM TableB WHERE DateOfActivation < stDate)
)) FROM TableA;
This problem becomes much easier if DateofActivation is a true effective dated table with rate_start_date and rate_end_date such that a new row cannot be created where its start date or end_date will lie within an existing rate_start_date -- rate_end_date pair. The currently active row typically would have a NULL value for rate_end_date. In addition, Likely, you would want an EMP_ID on the salary table to be able to sum the rows to finish the calculation; and one needs to consider the following cases:
Start Date is between rate_start and rate_end
End Date is between rate_start and rate_end
Rate_start and Rate_end are between start_date and end_date (sandwiched)
If you run the following snippet you will see we can artificially create our rate_end_dates as follows:
SELECT D.ACTIVEDATE, D.RATE, NVL(MIN(E.ACTIVEDATE)-1,SYSDATE) ENDDATE
FROM XX_DATEOFACTIVATION D, XX_DATEOFACTIVATION E
WHERE D.ACTIVEDATE<E.ACTIVEDATE(+)
GROUP BY D.ACTIVEDATE, D.RATE
ORDER BY D.ACTIVEDATE
Proposed code is as follows:
SELECT DISTINCT * FROM
(SELECT S.*, T.RATE, S.SALARY*T.RATE
FROM XX_SAL_HIST S,
(SELECT D.ACTIVEDATE, D.RATE, NVL(MIN(E.ACTIVEDATE)-1,SYSDATE) ENDDATE
FROM XX_DATEOFACTIVATION D, XX_DATEOFACTIVATION E
WHERE D.ACTIVEDATE<E.ACTIVEDATE(+)
GROUP BY D.ACTIVEDATE, D.RATE) T -- creating synthetic rate_end_date
WHERE S.STDATE BETWEEN T.ACTIVEDATE AND T.ENDDATE)
UNION
(SELECT S.*, T.RATE, S.SALARY*T.RATE
FROM XX_SAL_HIST S,
(SELECT D.ACTIVEDATE, D.RATE, NVL(MIN(E.ACTIVEDATE)-1,SYSDATE) ENDDATE
FROM XX_DATEOFACTIVATION D, XX_DATEOFACTIVATION E
WHERE D.ACTIVEDATE<E.ACTIVEDATE(+)
GROUP BY D.ACTIVEDATE, D.RATE) T -- creating synthetic rate_end_date
WHERE S.ENDDATE BETWEEN T.ACTIVEDATE AND T.ENDDATE)
UNION
(SELECT S.*, T.RATE, S.SALARY*T.RATE
FROM XX_SAL_HIST S,
(SELECT D.ACTIVEDATE, D.RATE, NVL(MIN(E.ACTIVEDATE)-1,SYSDATE) ENDDATE
FROM XX_DATEOFACTIVATION D, XX_DATEOFACTIVATION E
WHERE D.ACTIVEDATE<E.ACTIVEDATE(+)
GROUP BY D.ACTIVEDATE, D.RATE) T -- creating synthetic rate_end_date
WHERE T.ACTIVEDATE BETWEEN S.STDATE AND S.ENDDATE)
The first thing to do is to transform Table B (Table2 in the query) to have, for each row, the start and end date
Select DateOfActivation AS startDate
, rate
, NVL(LEAD(DateOfActivation, 1) OVER (ORDER BY DateOfActivation)
, TO_DATE('9999/12/31', 'yyyy/mm/dd')) AS endDate
From Table2
Now we can join this table with Table A (Table1 in the query)
WITH Rates AS (
Select DateOfActivation AS startDate
, rate
, NVL(LEAD(DateOfActivation, 1) OVER (ORDER BY DateOfActivation)
, TO_DATE('9999/12/31', 'yyyy/mm/dd')) AS endDate
From Table2)
Select SUM(s.salary * r.rate)
From Rates r
INNER JOIN Table1 s ON s.stDate < r.endDate AND s.endDate > r.startDate
The JOIN condition get every row in Table A that are at least partially in the activation period of the rate, if you need it to be inclusive you can alter it as in the following query
WITH Rates AS (
Select DateOfActivation AS startDate
, rate
, NVL(LEAD(DateOfActivation, 1) OVER (ORDER BY DateOfActivation)
, TO_DATE('9999/12/31', 'yyyy/mm/dd')) AS endDate
From Table2)
Select SUM(s.salary * r.rate)
From Rates r
INNER JOIN Table1 s ON s.stDate >= r.startDate AND s.endDate <= r.endDate
Related
My data is in the form:
To reproduce:
DROP TABLE IF EXISTS SALARY;
CREATE TEMP TABLE salary
(
Employee varchar(100),
Salary1 numeric(38,12),
Salary2 numeric(38,12)
);
INSERT INTO salary (Employee, Salary1 ,Salary2)
VALUES ('A1',100,300),('A2',200,300),('A3',300,450),('A4',400,600);
I want to divide it evenly (as we have data for 2 days of salary aggregated into 1 column) and cast it into a daily level data as below:
Hence, if you see for employee A2-Sum of salary for 3rd and 4th may would be 300 (150+150 from the 2nd table).
Any help/leads appreciated.
A materialized calendar table with the desired dates will facilitate generating the dates needed for the query. Without one, a tally table or CTE (as in the below example) is an alternative method.
DECLARE
#StartDate date = '2022-05-01'
, #DaysPerSalary int = 2;
WITH
t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))
,tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c)
,salary1 AS (
SELECT Employee, Salary1 / #DaysPerSalary AS Salary
FROM SALARY
)
,salary2 AS (
SELECT Employee, Salary2 / #DaysPerSalary AS Salary
FROM SALARY
)
SELECT DATEADD(day, tally.num-1, #StartDate), Employee, Salary
FROM tally
CROSS JOIN salary1
WHERE tally.num <= #DaysPerSalary
UNION ALL
SELECT DATEADD(day, tally.num-1 + #DaysPerSalary, #StartDate), Employee, Salary
FROM tally
CROSS JOIN salary2
WHERE tally.num <= #DaysPerSalary
ORDER BY Employee, Salary;
This is how the data looks like. It's a long table
I need to calculate the number of people employed by day
How to write SQL Server logic to get this result? I treid to create a DATES table and then join, but this caused an error because the table is too big. Do I need a recursive logic?
For future questions, don't post images of data. Instead, use a service like dbfiddle. I'll anyhow add a sketch for an answer, with a better-prepared question you could have gotten a complete answer. Anyhow here it goes:
-- extrema is the least and the greatest date in staff table
with extrema(mn, mx) as (
select least(min(hired),min(retired)) as mn
, greatest(max(hired),max(retired)) as mx
from staff
), calendar (dt) as (
-- we construct a calendar with every date between extreme values
select mn from extrema
union all
select dateadd(day, 1, d)
from calendar
where dt < (select mx from extrema)
)
-- finally we can count the number of employed people for each such date
select dt, count(1)
from calendar c
join staff s
on c.dt between s.hired and s.retired
group by dt;
If you find yourself doing this kind of calculation often, it is a good idea to create a calendar table. You can add other attributes to it such as if it is a day of in the middle of the week etc.
With a constraint as:
CHECK(hired <= retired)
the first part can be simplified to:
with extrema(mn, mx) as (
select min(hired) as mn
, max(retired) as mx
from staff
),
Assuming Current Employees have a NULL retirement date
Declare #Date1 date = '2015-01-01'
Declare #Date2 date = getdate()
Select A.Date
,HeadCount = count(B.name)
From ( Select Top (DateDiff(DAY,#Date1,#Date2)+1)
Date=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),#Date1)
From master..spt_values n1,master..spt_values n2
) A
Left Join YourTable B on A.Date >= B.Hired and A.Date <= coalesce(B.Retired,getdate())
Group BY A.Date
You need a calendar table for this. You start with the calendar, and LEFT JOIN everything else, using BETWEEN logic.
You can use a real table. Or you can generate it on the fly, like this:
WITH
L0 AS ( SELECT c = 1
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c) ),
L1 AS ( SELECT c = 1 FROM L0 A, L0 B, L0 C, L0 D ),
Nums AS ( SELECT rownum = ROW_NUMBER() OVER(ORDER BY (SELECT 1))
FROM L1 ),
Dates AS (
SELECT TOP (DATEDIFF(day, '20141231', GETDATE()))
Date = DATEADD(day, rownum, '20141231')
FROM Nums
)
SELECT
d.Date,
NumEmployed = COUNT(*)
FROM Dates d
JOIN YourTable t ON d.Date BETWEEN t.Hired AND t.Retired
GROUP BY
d.Date;
If your dates have a time component then you need to use >= AND < logic
Try limiting the scope of your date table. In this example I have a table of dates named TallyStickDT.
SELECT dt, COUNT(name)
FROM (
SELECT dt
FROM tallystickdt
WHERE dt >= (SELECT MIN(hired) FROM #employees)
AND dt <= GETDATE()
) A
LEFT OUTER JOIN #employees E ON A.dt >= E.Hired AND A.dt <= e.retired
GROUP BY dt
ORDER BY dt
Table 1 = emp - (emp_id, store_id, start_dt, end_dt, amount)
Table 2 = Sales - (emp_id, product_id, sales_dateline, qty, amount)
Table 1 has data like this
emp_id store_id amount start_dt, end_dt
1 1 200 2/21/2019 10/21/2019
1 2 400 10/22/2019 12/31/2019
How can we find top 3 employees working on each store during Q4 of 2019 ( sales_dateline column).
Note - we need to consider amount from previous store also for each employee. store_id should be displayed in result set. please help.
This query gets the top 3 employees working in multiple stores during Q4 of 2019 based on the sum of the sales_dateline column of Table 2. The total sales per employee includes sales from each of the store_id's the employee worked in.
Step 1: (cte named 'q4_sales_per_employee_per_store'): Summarize the sales (sum_sales) per employee per store in q4 of 2019.
Step 2: (cte named 'q4_sales_per_multi_store_employee'): Summarize the aggregate sales (sum_sales) per
employee in q4 of 2019.
Step 3: (cte named 'q4_multi_store_employee_rankings'): Rank [row rank and dense rank (ties are given equal rank)] of aggregate sales (sum_sales) per employee in q4 of 2019.
Step 4: Select the top 3 (based on dense rank which could be more then 3 employees due to ties) multi-store employees with the highest sales (across all stores) JOIN'ed with per store sales.
Something like this
with
q4_sales_per_employee_per_store(emp_id, store_id, sum_qty, sum_amount) as (
select t1.emp_id, t1.store_id, sum(t2.qty), sum(t2.amount)
from Table1 t1
join Table2 t2 on t1.emp_id=t2.emp_id
where t2.sales_dateline>=cast('20191001' as date)
and t2.sales_dateline<cast('20200101' as date)
group by t1.emp_id, t1.store_id),
q4_sales_per_multi_store_employee(emp_id, sum_qty, sum_amount) as (
select qs.emp_id, sum(sum_qty), sum(sum_amount)
from q4_sales_per_employee_per_store
group by qs.emp_id
having count(*)>1)
q4_multi_store_employee_rankings(emp_id, sum_qty, sum_amount, sales_row, sales_rank) as (
select *,
row_number over (order by sum_amount desc) sales_row,
dense_rank over (order by sum_amount desc) sales_rank
from q4_sales_per_multi_store_employee)
select mser.sales_rank, mser.sales_row, mser.emp_id,
spe.store_id, spe.sum_qty, spe.sum_amount
from q4_multi_store_employee_rankings mser
join q4_sales_per_employee_per_store spe on mse.emp_id=spe.emp_id
where mser.sales_rank<=3
order by mser.sales_rank, mser.sales_row;
The I think your answer involves several logical steps. I'm calling them 'logical steps' because they will be combined together into one statement at the end (so the solution only has one query).
I'll start with adding variables for start and end dates rather than hardcoding them.
-- Period variables to define start and end of quarter
DECLARE #PeriodStart date = '20191001';
DECLARE #PeriodEnd date '20200101';
Note that when used, we use >= #PeriodStart but < #PeriodEnd so the enddate is midnight on the morning of the first day of the next quarter.
Then you need to work out the sales for each employee, regardless of store. I think you can do this simply with something like
SELECT emp_id, SUM(amount) AS TotalSales
FROM [sales] S
WHERE sales_dateline >= #PeriodStart AND sales_dateline < #PeriodEnd
GROUP BY emp_id;
Notes
I don't know how you want to do date filtering for sales - the name salesdateline field implies it's not simply a datetime, but possible a text field or reference to another table. While I've assumed a datetime in the above, I will leave any modifications up to you.
Assume 'amount' in the Sales table is a total for the sale for that product, not the unit price. If it's the unit price, you need to multiple this by the qty field e.g., SUM(amount * qty).
Then, you need to determine in which store each person was last working in during Q4 2019 (Oct 1 - Dec 31). This means if an employee started in Store 1 during the quarter, then movied to Store 2, their record will be for Store 2 (but take into account sales for Store 1).
The below gives an approach for this
It first finds a single record per Employee per Store if the employee has worked in that store within the relevant period (regardless of how long, or over how many different sessions).
It finds the record with the latest startdate within that period - that will be the store the Employee is assigned to for reporting
SELECT store_id, emp_id
FROM
(SELECT store_id, emp_id,
ROW_NUMBER() OVER (PARTITION BY store_id, emp_id ORDER BY start_dt) AS rn
FROM [emp] E
WHERE (start_dt >= #PeriodStart AND start_dt < #PeriodEnd) -- started in period
OR (end_dt >= #PeriodStart AND end_dt < #PeriodEnd) -- ended in period
OR (start_dt < #PeriodStart AND end_dt >= #PeriodEnd) -- worked through period
) AS AllEmpStores
WHERE AllEmpStores.rn = 1;
Note that the previous version of my answer, I just reported all stores each employee worked at within that period. Therefore if a great salesperson worked at 10 stores within the period, they could feasibly show in the top 3 list for every store. The code for that was
SELECT DISTINCT E.Store_ID, E.Emp_ID
FROM [Emp] E
WHERE (E.start_dt >= #PeriodStart AND E.start_dt < #PeriodEnd) -- started in period
OR (E.end_dt >= #PeriodStart AND E.end_dt < #PeriodEnd) -- ended in period
OR (E.start_dt < #PeriodStart AND E.end_dt >= #PeriodEnd); -- worked through period
Now you have the two datasets (sales per employee, and employees by store), you can then join the first to the second, so you have Store_ID, Emp_ID, and Total_Sales (across all stores).
At that point you just need the top 3 per store, which you can do with a windowed function.
DECLARE #PeriodStart date = '20191001';
DECLARE #PeriodEnd date '20200101';
WITH S_CTE AS
(SELECT emp_id, SUM(amount) AS TotalSales
FROM [sales] S
WHERE sales_dateline >= #PeriodStart AND sales_dateline < #PeriodEnd
GROUP BY emp_id;
),
E_CTE AS
(SELECT store_id, emp_id
FROM
(SELECT store_id, emp_id,
ROW_NUMBER() OVER (PARTITION BY store_id, emp_id ORDER BY start_dt) AS rn
FROM [emp] E
WHERE (start_dt >= #PeriodStart AND start_dt < #PeriodEnd)
OR (end_dt >= #PeriodStart AND end_dt < #PeriodEnd)
OR (start_dt < #PeriodStart AND end_dt >= #PeriodEnd)
) AS AllEmpStores
WHERE AllEmpStores.rn = 1
)
SELECT store_ID, emp_ID, TotalSales
FROM
(SELECT E_CTE.store_ID, E_CTE.emp_ID, S_CTE.TotalSales,
DENSE_RANK() OVER (PARTITION BY E_CTE.storeID ORDER BY S_CTE.TotalSales DESC) AS SalesRank
FROM E_CTE
INNER JOIN S_CTE ON E_CTE.emp_id = S_CTE.emp_id
) AS A
WHERE A.SalesRank <= 3
Edits/updates:
Now finds 'latest' store per employee only (rather than all stores) within period
Clarified filtering on sales
Added DENSE_RANK based on question comments
... And I just fixed a typo where E_CTE and S_CTE were the wrong way around.
I have in the past written queries that give me counts by date (hires, terminations, etc...) as follows:
SELECT per.date_start AS "Date",
COUNT(peo.EMPLOYEE_NUMBER) AS "Hires"
FROM hr.per_all_people_f peo,
hr.per_periods_of_service per
WHERE per.date_start BETWEEN peo.effective_start_date AND peo.EFFECTIVE_END_DATE
AND per.date_start BETWEEN :PerStart AND :PerEnd
AND per.person_id = peo.person_id
GROUP BY per.date_start
I was now looking to create a count of active employees by date, however I am not sure how I would date the query as I use a range to determine active as such:
SELECT COUNT(peo.EMPLOYEE_NUMBER) AS "CT"
FROM hr.per_all_people_f peo
WHERE peo.current_employee_flag = 'Y'
and TRUNC(sysdate) BETWEEN peo.effective_start_date AND peo.EFFECTIVE_END_DATE
Here is a simple way to get started. This works for all the effective and end dates in your data:
select thedate,
SUM(num) over (order by thedate) as numActives
from ((select effective_start_date as thedate, 1 as num from hr.per_periods_of_service) union all
(select effective_end_date as thedate, -1 as num from hr.per_periods_of_service)
) dates
It works by adding one person for each start and subtracting one for each end (via num) and doing a cumulative sum. This might have duplicates dates, so you might also do an aggregation to eliminate those duplicates:
select thedate, max(numActives)
from (select thedate,
SUM(num) over (order by thedate) as numActives
from ((select effective_start_date as thedate, 1 as num from hr.per_periods_of_service) union all
(select effective_end_date as thedate, -1 as num from hr.per_periods_of_service)
) dates
) t
group by thedate;
If you really want all dates, then it is best to start with a calendar table, and use a simple variation on your original query:
select c.thedate, count(*) as NumActives
from calendar c left outer join
hr.per_periods_of_service pos
on c.thedate between pos.effective_start_date and pos.effective_end_date
group by c.thedate;
If you want to count all employees who were active during the entire input date range
SELECT COUNT(peo.EMPLOYEE_NUMBER) AS "CT"
FROM hr.per_all_people_f peo
WHERE peo.[EFFECTIVE_START_DATE] <= :StartDate
AND (peo.[EFFECTIVE_END_DATE] IS NULL OR peo.[EFFECTIVE_END_DATE] >= :EndDate)
Here is my example based on Gordon Linoff answer
with a little modification, because in SUBSTRACT table all records were appeared with -1 in NUM, even if no date was in END DATE = NULL.
use AdventureWorksDW2012 --using in MS SSMS for choosing DATABASE to work with
-- and may be not work in other platforms
select
t.thedate
,max(t.numActives) AS "Total Active Employees"
from (
select
dates.thedate
,SUM(dates.num) over (order by dates.thedate) as numActives
from
(
(
select
StartDate as thedate
,1 as num
from DimEmployee
)
union all
(
select
EndDate as thedate
,-1 as num
from DimEmployee
where EndDate IS NOT NULL
)
) AS dates
) AS t
group by thedate
ORDER BY thedate
worked for me, hope it will help somebody
I was able to get the results I was looking for with the following:
--Active Team Members by Date
SELECT "a_date",
COUNT(peo.EMPLOYEE_NUMBER) AS "CT"
FROM hr.per_all_people_f peo,
(SELECT DATE '2012-04-01'-1 + LEVEL AS "a_date"
FROM dual
CONNECT BY LEVEL <= DATE '2012-04-30'+2 - DATE '2012-04-01'-1
)
WHERE peo.current_employee_flag = 'Y'
AND "a_date" BETWEEN peo.effective_start_date AND peo.EFFECTIVE_END_DATE
GROUP BY "a_date"
ORDER BY "a_date"
Consider two tables:
Transactions, with amounts in a foreign currency:
Date Amount
========= =======
1/2/2009 1500
2/4/2009 2300
3/15/2009 300
4/17/2009 2200
etc.
ExchangeRates, with the value of the primary currency (let's say dollars) in the foreign currency:
Date Rate
========= =======
2/1/2009 40.1
3/1/2009 41.0
4/1/2009 38.5
5/1/2009 42.7
etc.
Exchange rates can be entered for arbitrary dates - the user could enter them on a daily basis, weekly basis, monthly basis, or at irregular intervals.
In order to translate the foreign amounts to dollars, I need to respect these rules:
A. If possible, use the most recent previous rate; so the transaction on 2/4/2009 uses the rate for 2/1/2009, and the transaction on 3/15/2009 uses the rate for 3/1/2009.
B. If there isn't a rate defined for a previous date, use the earliest rate available. So the transaction on 1/2/2009 uses the rate for 2/1/2009, since there isn't an earlier rate defined.
This works...
Select
t.Date,
t.Amount,
ConvertedAmount=(
Select Top 1
t.Amount/ex.Rate
From ExchangeRates ex
Where t.Date > ex.Date
Order by ex.Date desc
)
From Transactions t
... but (1) it seems like a join would be more efficient & elegant, and (2) it doesn't deal with Rule B above.
Is there an alternative to using the subquery to find the appropriate rate? And is there an elegant way to handle Rule B, without tying myself in knots?
You could first do a self-join on the exchange rates which are ordered by date so that you have the start and the end date of each exchange rate, without any overlap or gap in the dates (maybe add that as view to your database - in my case I'm just using a common table expression).
Now joining those "prepared" rates with the transactions is simple and efficient.
Something like:
WITH IndexedExchangeRates AS (
SELECT Row_Number() OVER (ORDER BY Date) ix,
Date,
Rate
FROM ExchangeRates
),
RangedExchangeRates AS (
SELECT CASE WHEN IER.ix=1 THEN CAST('1753-01-01' AS datetime)
ELSE IER.Date
END DateFrom,
COALESCE(IER2.Date, GETDATE()) DateTo,
IER.Rate
FROM IndexedExchangeRates IER
LEFT JOIN IndexedExchangeRates IER2
ON IER.ix = IER2.ix-1
)
SELECT T.Date,
T.Amount,
RER.Rate,
T.Amount/RER.Rate ConvertedAmount
FROM Transactions T
LEFT JOIN RangedExchangeRates RER
ON (T.Date > RER.DateFrom) AND (T.Date <= RER.DateTo)
Notes:
You could replace GETDATE() with a date in the far future, I'm assuming here that no rates for the future are known.
Rule (B) is implemented by setting the date of the first known exchange rate to the minimal date supported by the SQL Server datetime, which should (by definition if it is the type you're using for the Date column) be the smallest value possible.
Suppose you had an extended exchange rate table that contained:
Start Date End Date Rate
========== ========== =======
0001-01-01 2009-01-31 40.1
2009-02-01 2009-02-28 40.1
2009-03-01 2009-03-31 41.0
2009-04-01 2009-04-30 38.5
2009-05-01 9999-12-31 42.7
We can discuss the details of whether the first two rows should be combined, but the general idea is that it is trivial to find the exchange rate for a given date. This structure works with the SQL 'BETWEEN' operator which includes the ends of the ranges. Often, a better format for ranges is 'open-closed'; the first date listed is included and the second is excluded. Note that there is a constraint on the data rows - there are (a) no gaps in the coverage of the range of dates and (b) no overlaps in the coverage. Enforcing those constraints is not completely trivial (polite understatement - meiosis).
Now the basic query is trivial, and Case B is no longer a special case:
SELECT T.Date, T.Amount, X.Rate
FROM Transactions AS T JOIN ExtendedExchangeRates AS X
ON T.Date BETWEEN X.StartDate AND X.EndDate;
The tricky part is creating the ExtendedExchangeRate table from the given ExchangeRate table on the fly.
If it is an option, then revising the structure of the basic ExchangeRate table to match the ExtendedExchangeRate table would be a good idea; you resolve the messy stuff when the data is entered (once a month) instead of every time an exchange rate needs to be determined (many times a day).
How to create the extended exchange rate table? If your system supports adding or subtracting 1 from a date value to obtain the next or previous day (and has a single row table called 'Dual'), then a variation
on this will work (without using any OLAP functions):
CREATE TABLE ExchangeRate
(
Date DATE NOT NULL,
Rate DECIMAL(10,5) NOT NULL
);
INSERT INTO ExchangeRate VALUES('2009-02-01', 40.1);
INSERT INTO ExchangeRate VALUES('2009-03-01', 41.0);
INSERT INTO ExchangeRate VALUES('2009-04-01', 38.5);
INSERT INTO ExchangeRate VALUES('2009-05-01', 42.7);
First row:
SELECT '0001-01-01' AS StartDate,
(SELECT MIN(Date) - 1 FROM ExchangeRate) AS EndDate,
(SELECT Rate FROM ExchangeRate
WHERE Date = (SELECT MIN(Date) FROM ExchangeRate)) AS Rate
FROM Dual;
Result:
0001-01-01 2009-01-31 40.10000
Last row:
SELECT (SELECT MAX(Date) FROM ExchangeRate) AS StartDate,
'9999-12-31' AS EndDate,
(SELECT Rate FROM ExchangeRate
WHERE Date = (SELECT MAX(Date) FROM ExchangeRate)) AS Rate
FROM Dual;
Result:
2009-05-01 9999-12-31 42.70000
Middle rows:
SELECT X1.Date AS StartDate,
X2.Date - 1 AS EndDate,
X1.Rate AS Rate
FROM ExchangeRate AS X1 JOIN ExchangeRate AS X2
ON X1.Date < X2.Date
WHERE NOT EXISTS
(SELECT *
FROM ExchangeRate AS X3
WHERE X3.Date > X1.Date AND X3.Date < X2.Date
);
Result:
2009-02-01 2009-02-28 40.10000
2009-03-01 2009-03-31 41.00000
2009-04-01 2009-04-30 38.50000
Note that the NOT EXISTS sub-query is rather crucial. Without it, the 'middle rows' result is:
2009-02-01 2009-02-28 40.10000
2009-02-01 2009-03-31 40.10000 # Unwanted
2009-02-01 2009-04-30 40.10000 # Unwanted
2009-03-01 2009-03-31 41.00000
2009-03-01 2009-04-30 41.00000 # Unwanted
2009-04-01 2009-04-30 38.50000
The number of unwanted rows increases dramatically as the table increases in size (for N > 2 rows, there are (N-2) * (N - 3) / 2 unwanted rows, I believe).
The result for ExtendedExchangeRate is the (disjoint) UNION of the three queries:
SELECT DATE '0001-01-01' AS StartDate,
(SELECT MIN(Date) - 1 FROM ExchangeRate) AS EndDate,
(SELECT Rate FROM ExchangeRate
WHERE Date = (SELECT MIN(Date) FROM ExchangeRate)) AS Rate
FROM Dual
UNION
SELECT X1.Date AS StartDate,
X2.Date - 1 AS EndDate,
X1.Rate AS Rate
FROM ExchangeRate AS X1 JOIN ExchangeRate AS X2
ON X1.Date < X2.Date
WHERE NOT EXISTS
(SELECT *
FROM ExchangeRate AS X3
WHERE X3.Date > X1.Date AND X3.Date < X2.Date
)
UNION
SELECT (SELECT MAX(Date) FROM ExchangeRate) AS StartDate,
DATE '9999-12-31' AS EndDate,
(SELECT Rate FROM ExchangeRate
WHERE Date = (SELECT MAX(Date) FROM ExchangeRate)) AS Rate
FROM Dual;
On the test DBMS (IBM Informix Dynamic Server 11.50.FC6 on MacOS X 10.6.2), I was able to convert the query into a view but I had to stop cheating with the data types - by coercing the strings into dates:
CREATE VIEW ExtendedExchangeRate(StartDate, EndDate, Rate) AS
SELECT DATE('0001-01-01') AS StartDate,
(SELECT MIN(Date) - 1 FROM ExchangeRate) AS EndDate,
(SELECT Rate FROM ExchangeRate WHERE Date = (SELECT MIN(Date) FROM ExchangeRate)) AS Rate
FROM Dual
UNION
SELECT X1.Date AS StartDate,
X2.Date - 1 AS EndDate,
X1.Rate AS Rate
FROM ExchangeRate AS X1 JOIN ExchangeRate AS X2
ON X1.Date < X2.Date
WHERE NOT EXISTS
(SELECT *
FROM ExchangeRate AS X3
WHERE X3.Date > X1.Date AND X3.Date < X2.Date
)
UNION
SELECT (SELECT MAX(Date) FROM ExchangeRate) AS StartDate,
DATE('9999-12-31') AS EndDate,
(SELECT Rate FROM ExchangeRate WHERE Date = (SELECT MAX(Date) FROM ExchangeRate)) AS Rate
FROM Dual;
I can't test this, but I think it would work. It uses coalesce with two sub-queries to pick the rate by rule A or rule B.
Select t.Date, t.Amount,
ConvertedAmount = t.Amount/coalesce(
(Select Top 1 ex.Rate
From ExchangeRates ex
Where t.Date > ex.Date
Order by ex.Date desc )
,
(select top 1 ex.Rate
From ExchangeRates
Order by ex.Date asc)
)
From Transactions t
SELECT
a.tranDate,
a.Amount,
a.Amount/a.Rate as convertedRate
FROM
(
SELECT
t.date tranDate,
e.date as rateDate,
t.Amount,
e.rate,
RANK() OVER (Partition BY t.date ORDER BY
CASE WHEN DATEDIFF(day,e.date,t.date) < 0 THEN
DATEDIFF(day,e.date,t.date) * -100000
ELSE DATEDIFF(day,e.date,t.date)
END ) AS diff
FROM
ExchangeRates e
CROSS JOIN
Transactions t
) a
WHERE a.diff = 1
The difference between tran and rate date is calculated, then negative values ( condition b) are multiplied by -10000 so that they can still be ranked but positive values (condition a always take priority. we then select the minimum date difference for each tran date using the rank over clause.
Many solutions will work. You should really find the one that works best (fastest) for your workload: do you search usually for one Transaction, list of them, all of them?
The tie-breaker solution given your schema is:
SELECT t.Date,
t.Amount,
r.Rate
--//add your multiplication/division here
FROM "Transactions" t
INNER JOIN "ExchangeRates" r
ON r."ExchangeRateID" = (
SELECT TOP 1 x."ExchangeRateID"
FROM "ExchangeRates" x
WHERE x."SourceCurrencyISO" = t."SourceCurrencyISO" --//these are currency-related filters for your tables
AND x."TargetCurrencyISO" = t."TargetCurrencyISO" --//,which you should also JOIN on
AND x."Date" <= t."Date"
ORDER BY x."Date" DESC)
You need to have the right indices for this query to be fast. Also ideally you should not have a JOIN on "Date", but on "ID"-like field (INTEGER). Give me more schema info, I will create an example for you.
There's nothing about a join that will be more elegant than the TOP 1 correlated subquery in your original post. However, as you say, it doesn't satisfy requirement B.
These queries do work (SQL Server 2005 or later required). See the SqlFiddle for these.
SELECT
T.*,
ExchangeRate = E.Rate
FROM
dbo.Transactions T
CROSS APPLY (
SELECT TOP 1 Rate
FROM dbo.ExchangeRate E
WHERE E.RateDate <= T.TranDate
ORDER BY
CASE WHEN E.RateDate <= T.TranDate THEN 0 ELSE 1 END,
E.RateDate DESC
) E;
Note that the CROSS APPLY with a single column value is functionally equivalent to the correlated subquery in the SELECT clause as you showed. I just prefer CROSS APPLY now because it is much more flexible and lets you reuse the value in multiple places, have multiple rows in it (for custom unpivoting) and lets you have multiple columns.
SELECT
T.*,
ExchangeRate = Coalesce(E.Rate, E2.Rate)
FROM
dbo.Transactions T
OUTER APPLY (
SELECT TOP 1 Rate
FROM dbo.ExchangeRate E
WHERE E.RateDate <= T.TranDate
ORDER BY E.RateDate DESC
) E
OUTER APPLY (
SELECT TOP 1 Rate
FROM dbo.ExchangeRate E2
WHERE E.Rate IS NULL
ORDER BY E2.RateDate
) E2;
I don't know which one might perform better, or if either will perform better than other answers on the page. With a proper index on the Date columns, they should zing pretty well--definitely better than any Row_Number() solution.