I have below table with Swedish swaprates.
I am making a forward-looking PnL-report and therefore want to know what my basis rate would be on maturity rate for a bond so I can evaluate whether we should reinvest immediately on maturity date or not.
Lets say bond X matures at 2017-03-01, I would then like to know the interpolated value for that day based on 3 months and 6 months.
I have like 130 different bonds with different maturity dates up to 5 years.
Is there a smooth way to, based on below values, interpolate every single day up to 5 years?
name | ccy | price | datedays | timeband | Rate_Date
STIBOR | SEK | -0.562 | 1 | OVERNIGHT | 2016-10-07
STIBOR | SEK | -0.559 | 7 | 1 WEEK | 2016-10-13
STIBOR | SEK | -0.631 | 32 | 1 MONTH | 2016-11-07
STIBOR | SEK | -0.577 | 61 | 2 MONTHS | 2016-12-06
STIBOR | SEK | -0.741 | 95 | 3 MONTHS | 2017-01-09
STIBOR | SEK | -0.349 | 182 | 6 MONTHS | 2017-04-06
SWAP | SEK | -0.499 | 369 | 1 YEAR | 2017-10-10
SWAP | SEK | -0.403 | 734 | 2 YEARS | 2018-10-10
SWAP | SEK | -0.285 | 1099 | 3 YEARS | 2019-10-10
SWAP | SEK | -0.151 | 1467 | 4 YEARS | 2020-10-12
SWAP | SEK | 0.003 | 1831 | 5 YEARS | 2021-10-11
See Value1 and Value2 below, which are interpolated.
Value1 is calculated like this:
Inteprolating factor = Price(Stibor2M) - Price(Stibor1m) / ( datedays(2m)-datedays(1m) )
-->
(-0,741 - -0,631)/(61-32) = 0,0018621
So, between Stibor 1M and Stibor 2M, we add this amount every day.
To see how many days it is between first measure and desired measure:
Datedays(Value1) - Datedays(Stibor 1M) = 50-32 = 18
Interpolated value for 2016-11-25 will then be:
Price(Stibor1M) + 18 * interpolating factor = -0,631 + 18*0,0018621 = -0,597483
.
name | ccy | price | datedays | timeband | Rate_Date
STIBOR | SEK | -0.562 | 1 | OVERNIGHT | 2016-10-07
STIBOR | SEK | -0.559 | 7 | 1 WEEK | 2016-10-13
STIBOR | SEK | -0.631 | 32 | 1 MONTH | 2016-11-07
Value1 | SEK | -0.597 | 50 | - | 2016-11-25
STIBOR | SEK | -0.577 | 61 | 2 MONTHS | 2016-12-06
STIBOR | SEK | -0.741 | 95 | 3 MONTHS | 2017-01-09
Value2 | SEK | -0.607 | 146 | - | 2017-03-01
STIBOR | SEK | -0.349 | 182 | 6 MONTHS | 2017-04-06
Related
I want to track the yearly spend of users based on a particular month from which we start their cycle. This is to keep track of their yearly spend so that they don't exceed the allowed limits. I have the following two tables:
Spend (Contains 1 row per user per month) (I can modify the date column of this table to any date format as needed, if it helps):
+----+-----------+------+-------+-------+
| ID | Date | Year | Month | Spend |
+----+-----------+------+-------+-------+
| 11 | 01-Sep-19 | 2019 | 9 | 10 |
+----+-----------+------+-------+-------+
| 11 | 01-Oct-19 | 2019 | 10 | 23 |
+----+-----------+------+-------+-------+
| 11 | 01-Nov-19 | 2019 | 11 | 27 |
+----+-----------+------+-------+-------+
| 11 | 01-Dec-19 | 2019 | 12 | 14 |
+----+-----------+------+-------+-------+
| 11 | 01-Jan-20 | 2020 | 1 | 13 |
+----+-----------+------+-------+-------+
| 11 | 01-Feb-20 | 2020 | 2 | 33 |
+----+-----------+------+-------+-------+
| 11 | 01-Mar-20 | 2020 | 3 | 25 |
+----+-----------+------+-------+-------+
| 11 | 01-Apr-20 | 2020 | 4 | 17 |
+----+-----------+------+-------+-------+
| 11 | 01-May-20 | 2020 | 5 | 14 |
+----+-----------+------+-------+-------+
| 11 | 01-Jun-20 | 2020 | 6 | 10 |
+----+-----------+------+-------+-------+
| 11 | 01-Jul-20 | 2020 | 7 | 46 |
+----+-----------+------+-------+-------+
| 11 | 01-Aug-20 | 2020 | 8 | 53 |
+----+-----------+------+-------+-------+
| 11 | 01-Sep-20 | 2020 | 9 | 38 |
+----+-----------+------+-------+-------+
| 11 | 01-Oct-20 | 2020 | 10 | 22 |
+----+-----------+------+-------+-------+
| 11 | 01-Nov-20 | 2020 | 11 | 29 |
+----+-----------+------+-------+-------+
| 50 | 01-Jul-20 | 2020 | 7 | 56 |
+----+-----------+------+-------+-------+
| 50 | 01-Aug-20 | 2020 | 8 | 62 |
+----+-----------+------+-------+-------+
| 50 | 01-Sep-20 | 2020 | 9 | 77 |
+----+-----------+------+-------+-------+
| 50 | 01-Oct-20 | 2020 | 10 | 52 |
+----+-----------+------+-------+-------+
| 50 | 01-Nov-20 | 2020 | 11 | 45 |
+----+-----------+------+-------+-------+
Billing Cycle (contains the months between which we calculate their total spends):
+-----+------------+----------+
| ID | StartMonth | EndMonth |
+-----+------------+----------+
| 11 | 10 | 9 |
+-----+------------+----------+
| 50 | 9 | 8 |
+-----+------------+----------+
Sample Output:
+----+-------+------------+
| ID | Cycle | TotalSpend |
+----+-------+------------+
| 11 | 1 | 10 |
+----+-------+------------+
| 11 | 2 | 313 |
+----+-------+------------+
| 11 | 3 | 51 |
+----+-------+------------+
| 50 | 1 | 118 |
+----+-------+------------+
| 50 | 2 | 174 |
+----+-------+------------+
In the sample output, for ID = 11, cycle 1 indicates spend in Sep'19, cycle 2 indicates total spend from Oct'19 (Month 10) to Sep'20 (Month 9) and cycle 3 indicates total spend for the next 12 months from Oct'20 (till whichever month data is present).
I'm a beginner to SQL and I believe doing this might require the use of CTE/Subqueries. Would appreciate any help or guidance for this.
Since this seems to be an exercise of some sort, I'm not going to provide a full answer, but give you hints to how this could be solved conceptually.
First I think you should associate entries to effective cycles (with cycle number) for the required date range. This could be done by using a recursive CTE. These are not the most efficient approach, but since we don't have the effective cycles with their numbers as a distinct table it can be a working solution nevertheless.
The result then just needs to be grouped by the ID and cycle number and the amounts summed up, and you're done.
I am trying to give rank column of every group which repeating in every rows within the group of the original table but not the shape of after sum-up.
The formula i found in another site but it show an error :
https://intellipaat.com/community/9734/rank-categories-by-sum-power-bi
Table1
+-----------+------------+-------+
| product | date | sales |
+-----------+------------+-------+
| coffee | 11/03/2019 | 15 |
| coffee | 12/03/2019 | 10 |
| coffee | 13/03/2019 | 28 |
| coffee | 14/03/2019 | 1 |
| tea | 11/03/2019 | 5 |
| tea | 12/03/2019 | 2 |
| tea | 13/03/2019 | 6 |
| tea | 14/03/2019 | 7 |
| Chocolate | 11/03/2019 | 30 |
| Chocolate | 11/03/2019 | 4 |
| Chocolate | 11/03/2019 | 15 |
| Chocolate | 11/03/2019 | 10 |
+-----------+------------+-------+
The Goal
+-----------+------------+-------+-----+------+
| product | date | sales | sum | rank |
+-----------+------------+-------+-----+------+
| coffee | 11/03/2019 | 15 | 54 | 5 |
| coffee | 12/03/2019 | 10 | 54 | 5 |
| coffee | 13/03/2019 | 28 | 54 | 5 |
| coffee | 14/03/2019 | 1 | 54 | 5 |
| tea | 11/03/2019 | 5 | 20 | 9 |
| tea | 12/03/2019 | 2 | 20 | 9 |
| tea | 13/03/2019 | 6 | 20 | 9 |
| tea | 14/03/2019 | 7 | 20 | 9 |
| Chocolate | 11/03/2019 | 30 | 59 | 1 |
| Chocolate | 11/03/2019 | 4 | 59 | 1 |
| Chocolate | 11/03/2019 | 15 | 59 | 1 |
| Chocolate | 11/03/2019 | 10 | 59 | 1 |
+-----------+------------+-------+-----+------+
The script
sum =
SUMX(
FILTER(
Table1;
Table1[product] = EARLIER(Table1[product])
);
Table1[sales]
)
The Error :
EARLIER(Table1[product]) # Parameter is not correct type cannot find name 'product'
What's wrong with the script above ?
* not able to test this script:
rank = RANKX( ALL(Table1); Table1[sum]; ;; "Dense" )
before fixed the sum approach
The script is designed for a calculated column, not a measure. If you enter it as a measure, EARLIER has no "previous" row context to refer to, and gives you the error.
Create a measure:
Total Sales = SUM(Table1[sales])
This measure will be used to show sales.
Create another measure:
Sales by Product =
SUMX(
VALUES(Table1[product]);
CALCULATE([Total Sales]; ALL(Table1[date]))
)
This measure will show sales by product ignoring dates.
Third measure:
Sale Rank =
RANKX(
ALL(Table1[product]; Table1[date]);
[Sales by Product];;DESC;Dense)
Create a report with product and dates on a pivot, and drop all 3 measures into it. Result:
Tweak RANKX parameters to change the ranking mode, if necessary.
I have my dataset in the given format
It's a month level data along with salary for each month.
I need to calculate cumulative salary for each month end. How can I do this
+----------+-------+--------+---------------+
| Account | Month | Salary | Running Total |
+----------+-------+--------+---------------+
| a | 1 | 586 | 586 |
| a | 2 | 928 | 1514 |
| a | 3 | 726 | 2240 |
| a | 4 | 538 | 538 |
| b | 1 | 956 | 1494 |
| b | 3 | 667 | 2161 |
| b | 4 | 841 | 3002 |
| c | 1 | 826 | 826 |
| c | 2 | 558 | 1384 |
| c | 3 | 558 | 1972 |
| c | 4 | 735 | 2707 |
| c | 5 | 691 | 3398 |
| d | 1 | 670 | 670 |
| d | 4 | 838 | 1508 |
| d | 5 | 1000 | 2508 |
+----------+-------+--------+---------------+
I need to calculate running total column which is cumulative column. How can I do efficiently in SQL?
You can use SUM with ORDER BY clause inside the OVER clause:
SELECT Account, Month, Salary,
SUM(Salary) OVER (PARTITION BY Account ORDER BY Month) AS RunningTotal
FROM mytable
What I'm trying to achieve: rolling total for quantity and amount for a given day, grouped by hour.
It's easy in most cases, but if you have some additional columns (dir and product in my case) and you don't want to group/filter on them, that's a problem.
I know there are extensions in Oracle and MSSQL specifically for that, and there's SELECT OVER PARTITION in Postgres.
At the moment I'm working on an app prototype, and it's backed by MySQL, and I have no idea what it will be using in production, so I'm trying to avoid vendor lock-in.
The entrire table:
> SELECT id, dir, product, date, hour, quantity, amount FROM sales
ORDER BY date, hour;
+------+-----+---------+------------+------+----------+--------+
| id | dir | product | date | hour | quantity | amount |
+------+-----+---------+------------+------+----------+--------+
| 2230 | 65 | ABCDEDF | 2014-09-11 | 1 | 1 | 10 |
| 2231 | 64 | ABCDEDF | 2014-09-11 | 3 | 4 | 40 |
| 2232 | 64 | ABCDEDF | 2014-09-11 | 5 | 5 | 50 |
| 2235 | 64 | ZZ | 2014-09-11 | 7 | 6 | 60 |
| 2233 | 64 | ABCDEDF | 2014-09-11 | 7 | 6 | 60 |
| 2237 | 66 | ABCDEDF | 2014-09-11 | 7 | 6 | 60 |
| 2234 | 64 | ZZ | 2014-09-18 | 3 | 1 | 11 |
| 2236 | 66 | ABCDEDF | 2014-09-18 | 3 | 1 | 100 |
| 2227 | 64 | ABCDEDF | 2014-09-18 | 3 | 1 | 100 |
| 2228 | 64 | ABCDEDF | 2014-09-18 | 5 | 2 | 200 |
| 2229 | 64 | ABCDEDF | 2014-09-18 | 7 | 3 | 300 |
+------+-----+---------+------------+------+----------+--------+
For a given date:
> SELECT id, dir, product, date, hour, quantity, amount FROM sales
WHERE date = '2014-09-18'
ORDER BY hour;
+------+-----+---------+------------+------+----------+--------+
| id | dir | product | date | hour | quantity | amount |
+------+-----+---------+------------+------+----------+--------+
| 2227 | 64 | ABCDEDF | 2014-09-18 | 3 | 1 | 100 |
| 2236 | 66 | ABCDEDF | 2014-09-18 | 3 | 1 | 100 |
| 2234 | 64 | ZZ | 2014-09-18 | 3 | 1 | 11 |
| 2228 | 64 | ABCDEDF | 2014-09-18 | 5 | 2 | 200 |
| 2229 | 64 | ABCDEDF | 2014-09-18 | 7 | 3 | 300 |
+------+-----+---------+------------+------+----------+--------+
The results that I need, using sub-select:
> SELECT date, hour, SUM(quantity),
( SELECT SUM(quantity) FROM sales s2
WHERE s2.hour <= s1.hour AND s2.date = s1.date
) AS total
FROM sales s1
WHERE s1.date = '2014-09-18'
GROUP by date, hour;
+------------+------+---------------+-------+
| date | hour | sum(quantity) | total |
+------------+------+---------------+-------+
| 2014-09-18 | 3 | 3 | 3 |
| 2014-09-18 | 5 | 2 | 5 |
| 2014-09-18 | 7 | 3 | 8 |
+------------+------+---------------+-------+
My concerns for using sub-select:
once there are round million records in the table, the query may become too slow, not sure if it's subject to optimizations even though it has no HAVING statements.
if I had to filter on a product or dir, I will have to put those conditions to both main SELECT and sub-SELECT too (WHERE product = / WHERE dir =).
sub-select only counts a single sum, while I need two of them (sum(quantity) и sum(amount)) (ERROR 1241 (21000): Operand should contain 1 column(s)).
The closest result I were able to get using JOIN:
> SELECT DISTINCT(s1.hour) AS ih, s2.date, s2.hour, s2.quantity, s2.amount, s2.id
FROM sales s1
JOIN sales s2 ON s2.date = s1.date AND s2.hour <= s1.hour
WHERE s1.date = '2014-09-18'
ORDER by ih;
+----+------------+------+----------+--------+------+
| ih | date | hour | quantity | amount | id |
+----+------------+------+----------+--------+------+
| 3 | 2014-09-18 | 3 | 1 | 100 | 2236 |
| 3 | 2014-09-18 | 3 | 1 | 100 | 2227 |
| 3 | 2014-09-18 | 3 | 1 | 11 | 2234 |
| 5 | 2014-09-18 | 3 | 1 | 100 | 2236 |
| 5 | 2014-09-18 | 3 | 1 | 100 | 2227 |
| 5 | 2014-09-18 | 5 | 2 | 200 | 2228 |
| 5 | 2014-09-18 | 3 | 1 | 11 | 2234 |
| 7 | 2014-09-18 | 3 | 1 | 100 | 2236 |
| 7 | 2014-09-18 | 3 | 1 | 100 | 2227 |
| 7 | 2014-09-18 | 5 | 2 | 200 | 2228 |
| 7 | 2014-09-18 | 7 | 3 | 300 | 2229 |
| 7 | 2014-09-18 | 3 | 1 | 11 | 2234 |
+----+------------+------+----------+--------+------+
I could stop here and just use those results to group by ih (hour), calculate the sum for quantity and amount and be happy. But something eats me up telling that this is wrong.
If I remove DISTINCT most rows become to be duplicated. Replacing JOIN with its invariants doesn't help.
Once I remove s2.id from statement you get a complete mess with disappearing/collapsion meaningful rows (e.g. ids 2236/2227 got collapsed):
> SELECT DISTINCT(s1.hour) AS ih, s2.date, s2.hour, s2.quantity, s2.amount
FROM sales s1
JOIN sales s2 ON s2.date = s1.date AND s2.hour <= s1.hour
WHERE s1.date = '2014-09-18'
ORDER by ih;
+----+------------+------+----------+--------+
| ih | date | hour | quantity | amount |
+----+------------+------+----------+--------+
| 3 | 2014-09-18 | 3 | 1 | 100 |
| 3 | 2014-09-18 | 3 | 1 | 11 |
| 5 | 2014-09-18 | 3 | 1 | 100 |
| 5 | 2014-09-18 | 5 | 2 | 200 |
| 5 | 2014-09-18 | 3 | 1 | 11 |
| 7 | 2014-09-18 | 3 | 1 | 100 |
| 7 | 2014-09-18 | 5 | 2 | 200 |
| 7 | 2014-09-18 | 7 | 3 | 300 |
| 7 | 2014-09-18 | 3 | 1 | 11 |
+----+------------+------+----------+--------+
Summing doesn't help, and it adds up to the mess.
First row (hour = 3) should have SUM(s2.quantity) equal 3, but it has 9. What does SUM(s1.quantity) shows is a complete mystery to me.
> SELECT DISTINCT(s1.hour) AS hour, sum(s1.quantity), s2.date, SUM(s2.quantity)
FROM sales s1 JOIN sales s2 ON s2.date = s1.date AND s2.hour <= s1.hour
WHERE s1.date = '2014-09-18'
GROUP BY hour;
+------+------------------+------------+------------------+
| hour | sum(s1.quantity) | date | sum(s2.quantity) |
+------+------------------+------------+------------------+
| 3 | 9 | 2014-09-18 | 9 |
| 5 | 8 | 2014-09-18 | 5 |
| 7 | 15 | 2014-09-18 | 8 |
+------+------------------+------------+------------------+
Bonus points/boss level:
I also need a column that will show total_reference, the same rolling total for the same periods for a different date (e.g. 2014-09-11).
If you want a cumulative sum in MySQL, the most efficient way is to use variables:
SELECT date, hour,
(#q := q + #q) as cumeq, (#a := a + #a) as cumea
FROM (SELECT date, hour, SUM(quantity) as q, SUM(amount) as a
FROM sales s
WHERE s.date = '2014-09-18'
GROUP by date, hour
) dh cross join
(select #q := 0, #a := 0) vars
ORDER BY date, hour;
If you are planning on working with databases such as Oracle, SQL Server, and Postgres, then you should use a database more similar in functionality and that supports that ANSI standard window functions. The right way to do this is with window functions, but MySQL doesn't support those. Postgres, SQL Server, and Oracle all have free versions that yo can use for development purposes.
Also, with proper indexing, you shouldn't have a problem with the subquery approach, even on large tables.
In had a rather large error in my previous question
select earliest date from multiple rows
The answer by horse_with_no_name returns a perfect result, and I am hugely appreciative, however I got my own initial question wrong so I really apologise; if you look at the table below;
circuit_uid |customer_name |rack_location |reading_date | reading_time | amps | volts | kw | kwh | kva | pf | key
--------------------------------------------------------------------------------------------------------------------------------------
cu1.cb1.r1 | Customer 1 | 12.01.a1 | 2012-01-02 | 00:01:01 | 4.51 | 229.32 | 1.03 | 87 | 1.03 | 0.85 | 15
cu1.cb1.r1 | Customer 1 | 12.01.a1 | 2012-01-02 | 01:01:01 | 4.18 | 230.3 | 0.96 | 90 | 0.96 | 0.84 | 16
cu1.cb1.r2 | Customer 1 | 12.01.a1 | 2012-01-02 | 00:01:01 | 4.51 | 229.32 | 1.03 | 21 | 1.03 | 0.85 | 15
cu1.cb1.r2 | Customer 1 | 12.01.a1 | 2012-01-02 | 01:01:01 | 4.18 | 230.3 | 0.96 | 23 | 0.96 | 0.84 | 16
cu1.cb1.s2 | Customer 2 | 10.01.a1 | 2012-01-02 | 00:01:01 | 7.34 | 228.14 | 1.67 | 179 | 1.67 | 0.88 | 24009
cu1.cb1.s2 | Customer 2 | 10.01.a1 | 2012-01-02 | 01:01:01 | 9.07 | 228.4 | 2.07 | 182 | 2.07 | 0.85 | 24010
cu1.cb1.s3 | Customer 2 | 10.01.a1 | 2012-01-02 | 00:01:01 | 7.34 | 228.14 | 1.67 | 121 | 1.67 | 0.88 | 24009
cu1.cb1.s3 | Customer 2 | 10.01.a1 | 2012-01-02 | 01:01:01 | 9.07 | 228.4 | 2.07 | 124 | 2.07 | 0.85 | 24010
cu1.cb1.r1 | Customer 3 | 01.01.a1 | 2012-01-02 | 00:01:01 | 7.32 | 229.01 | 1.68 | 223 | 1.68 | 0.89 | 48003
cu1.cb1.r1 | Customer 3 | 01.01.a1 | 2012-01-02 | 01:01:01 | 6.61 | 228.29 | 1.51 | 226 | 1.51 | 0.88 | 48004
cu1.cb1.r4 | Customer 3 | 01.01.a1 | 2012-01-02 | 00:01:01 | 7.32 | 229.01 | 1.68 | 215 | 1.68 | 0.89 | 48003
cu1.cb1.r4 | Customer 3 | 01.01.a1 | 2012-01-02 | 01:01:01 | 6.61 | 228.29 | 1.51 | 217 | 1.51 | 0.88 | 48004
As you can see each customer now has multiple circuits. So the result would now be the sum of each of the earliest kwh readings for each circuit per customer, so the result in this table would be;
customer_name | kwh(sum)
--------------+-----------
customer 1 | 108 (the result of 87 + 21)
customer 2 | 300 (the result of 179 + 121)
customer 3 | 438 (the result of 223 + 215)
There will be more than 2 circuits per customer and the readings can happen at varying times, hence the need for the 'earliest' reading.
Would anybody have any suggestions for the revised question?
PostgreSQL 8.4 on CentOs/Redhat.
SELECT customer_name, sum(kwh) AS kwh_total
FROM (
SELECT DISTINCT ON (customer_name, circuit_uid)
customer_name, circuit_uid, kwh
FROM readings
WHERE reading_date = '2012-01-02'::date
ORDER BY customer_name, circuit_uid, reading_time
) x
GROUP BY 1
Same as before, just pick the earliest per (customer_name, circuit_uid).
Then sum per customer_name.
Index
A multi-column index like the following will make this very fast:
CREATE INDEX readings_multi_idx
ON readings(reading_date, customer_name, circuit_uid, reading_time);
This is an extension to your original question:
select customer_name,
sum(kwh)
from (
select customer_name,
kwh,
reading_time,
reading_date,
row_number() over (partition by customer_name, circuit_uid order by reading_time) as rn
from readings
where reading_date = date '2012-01-02'
) t
where rn = 1
group by customer_name
Note the new sum() in the outer query and the changed partition by definition in the inner query (compared to your previous question) which calculates the first reading for each circuit_uid now (instead of the first for each customer).