Distinct lists on dates where an ID is present (i.e. intersects) on consecutive dates - sql

I'm trying to make an MSSQL query that produces lists of apartment prices. The ultimate goal of the query is to calculate the percentage change in average prices of apartments. However, this final calculation (namely taking averages) is something I can fix in code provided that the list(s) of prices that are retrieved are correct.
What makes this tricky is that apartments are sold and new ones added all the time, so when comparing prices from week to week (I have weekly data), I only want to compare prices for apartments that have a recorded price in weeks (t-1, t), (t, t+1), (t+1,t+2) etc. In other words, some apartments that had a recorded price in time (t-1) might not be there at time t, and some apartments may have been added at time t (and thus weren't there at time t-1). I only want to select prices in week t-1 and t where some ApartmentID exists in both week t-1 and t to calculate the average change in week t.
Example data
-------------------------------------------------------------
| RegistrationID | Date | Price | ApartmentID |
-------------------------------------------------------------
| 1 | 2014-04-04 | 5 | 1 |
| 2 | 2014-04-04 | 6 | 2 |
| 3 | 2014-04-04 | 4 | 3 |
| 4 | 2014-04-11 | 5.2 | 1 |
| 5 | 2014-04-11 | 4 | 3 |
| 6 | 2014-04-11 | 7 | 4 |
| 7 | 2014-04-19 | 5.1 | 1 |
| 8 | 2014-04-19 | 4.1 | 3 |
| 9 | 2014-04-19 | 7.1 | 4 |
| 10 | 2014-04-26 | 4.1 | 3 |
| 11 | 2014-04-26 | 7.2 | 4 |
-------------------------------------------------------------
Solutions thoughts
I think it makes sense to produce two different lists, one for odd-numbered weeks and one for even-numbered weeks. List 1 would then contain Date, Price and ApartmentID that are valid for the tuples (t-1,t), (t+1,t+2), (t+3,t+4) etc. while list 2 would contain the same for the tuples (t,t+1),(t+2,t+3),(t+4,t+5) etc. The reason I think two lists are needed is that for any given week t, there are two sets of apartments and corresponding prices that need to be produced - one that is "forward compatible" and one that is "backwards compatible".
If two such lists can be produced, then the rest is simply an exercise in taking averages over each distinct date.
I'm not really sure to begin here. I played a little around with Intersect, but I'm pretty sure I need to nest queries to get this to work.
Result
Using the methodology described above would yield two lists.
List 1:
Notice how RegistrationID 2 and 6 disappear because they don't exist in on both dates 2014-04-04 and 2014-04-11. The same goes for RegistrationID 7 as this apartment doesn't exist for both 2014-04-19 and 2014-04-26.
-------------------------------------------------------------
| RegistrationID | Date | Price | ApartmentID |
-------------------------------------------------------------
| 1 | 2014-04-04 | 5 | 1 |
| 3 | 2014-04-04 | 4 | 3 |
| 4 | 2014-04-11 | 5.2 | 1 |
| 5 | 2014-04-11 | 4 | 3 |
| 8 | 2014-04-19 | 4.1 | 3 |
| 9 | 2014-04-19 | 7.1 | 4 |
| 10 | 2014-04-26 | 4.1 | 3 |
| 11 | 2014-04-26 | 7.2 | 4 |
-------------------------------------------------------------
List 2:
Here, nothing disappears because every apartment is present in the tuples within the scope of this list.
-------------------------------------------------------------
| RegistrationID | Date | Price | ApartmentID |
-------------------------------------------------------------
| 4 | 2014-04-11 | 5.2 | 1 |
| 5 | 2014-04-11 | 4 | 3 |
| 6 | 2014-04-11 | 7 | 4 |
| 7 | 2014-04-19 | 5.1 | 1 |
| 8 | 2014-04-19 | 4.1 | 3 |
| 9 | 2014-04-19 | 7.1 | 4 |
-------------------------------------------------------------

Here's a solution. First, I get all the records from the table (I named it "ApartmentPrice"), computing the WeekOf (which is the Sunday of that week), PreviousWeek (the Sunday of the previous week), and NextWeek (the Sunday of the following week). I store that in a table variable (you could also put it in a CTE or a temp table).
declare #tempTable table(RegistrationId int, PriceDate date, Price decimal(8,2), ApartmentId int, WeekOf date, PreviousWeek date, NextWeek date)
Insert #tempTable
select ap.RegistrationId,
ap.PriceDate,
ap.Price,
ap.ApartmentId,
DATEADD(ww, DATEDIFF(ww,0,ap.PriceDate), 0) WeekOf,
DATEADD(ww, DATEDIFF(ww,0,dateadd(wk, -1, ap.PriceDate)), 0) PreviousWeek,
DATEADD(ww, DATEDIFF(ww,0,dateadd(wk, 1, ap.PriceDate)), 0) NextWeek
from ApartmentPrice ap
Then I join that table variable to itself where WeekOf equals either NextWeek or PreviousWeek. This gives the apartments that have a record in the adjoining week.
select distinct t.RegistrationId, t.PriceDate, t.Price, t.ApartmentId
from #tempTable t
join #tempTable t2 on t.ApartmentId = t2.ApartmentId and (t.WeekOf = t2.PreviousWeek or t.WeekOf = t2.NextWeek)
order by t.RegistrationId, t.ApartmentId, t.PriceDate
I'm using distinct because an apartment will appear more than once in the results if it does have an adjoining week record.
You can also find the average prices for each week like this:
select t.WeekOf, avg(distinct t.Price)
from #tempTable t
join #tempTable t2 on t.ApartmentId = t2.ApartmentId and (t.WeekOf = t2.PreviousWeek or t.WeekOf = t2.NextWeek)
group by t.WeekOf
order by t.WeekOf
Here's a SQL Fiddle. I added a few more rows to the test data to show that it handles dates that cross the end of the year boundary.

Related

Complex nested aggregations to get order totals

I have a system to track orders and related expenditures. This is a Rails app running on PostgreSQL. 99% of my app gets by with plain old Rails Active Record call etc. This one is ugly.
The expenditures table look like this:
+----+----------+-----------+------------------------+
| id | category | parent_id | note |
+----+----------+-----------+------------------------+
| 1 | order | nil | order with no invoices |
+----+----------+-----------+------------------------+
| 2 | order | nil | order with invoices |
+----+----------+-----------+------------------------+
| 3 | invoice | 2 | invoice for order 2 |
+----+----------+-----------+------------------------+
| 4 | invoice | 2 | invoice for order 2 |
+----+----------+-----------+------------------------+
Each expenditure has many expenditure_items and can the orders can be parents to the invoices. That table looks like this:
+----+----------------+-------------+-------+---------+
| id | expenditure_id | cbs_item_id | total | note |
+----+----------------+-------------+-------+---------+
| 1 | 1 | 1 | 5 | Fuit |
+----+----------------+-------------+-------+---------+
| 2 | 1 | 2 | 15 | Veggies |
+----+----------------+-------------+-------+---------+
| 3 | 2 | 1 | 123 | Fuit |
+----+----------------+-------------+-------+---------+
| 4 | 2 | 2 | 456 | Veggies |
+----+----------------+-------------+-------+---------+
| 5 | 3 | 1 | 34 | Fuit |
+----+----------------+-------------+-------+---------+
| 6 | 3 | 2 | 76 | Veggies |
+----+----------------+-------------+-------+---------+
| 7 | 4 | 1 | 26 | Fuit |
+----+----------------+-------------+-------+---------+
| 8 | 4 | 2 | 98 | Veggies |
+----+----------------+-------------+-------+---------+
I need to track a few things:
amounts left to be invoiced on orders (thats easy)
above but rolled up for each cbs_item_id (this is the ugly part)
The cbs_item_id is basically an accounting code to categorize the money spent etc. I have visualized what my end result would look like:
+-------------+----------------+-------------+---------------------------+-----------+
| cbs_item_id | expenditure_id | order_total | invoice_total | remaining |
+-------------+----------------+-------------+---------------------------+-----------+
| 1 | 1 | 5 | 0 | 5 |
+-------------+----------------+-------------+---------------------------+-----------+
| 1 | 2 | 123 | 60 | 63 |
+-------------+----------------+-------------+---------------------------+-----------+
| | | | Rollup for cbs_item_id: 1 | 68 |
+-------------+----------------+-------------+---------------------------+-----------+
| 2 | 1 | 15 | 0 | 15 |
+-------------+----------------+-------------+---------------------------+-----------+
| 2 | 2 | 456 | 174 | 282 |
+-------------+----------------+-------------+---------------------------+-----------+
| | | | Rollup for cbs_item_id: 2 | 297 |
+-------------+----------------+-------------+---------------------------+-----------+
order_total is the sum of total for all the expenditure_items of the given order ( category = 'order'). invoice_total is the sum of total for all the expenditure_items with parent_id = expenditures.id. Remaining is calculated as the difference (but not greater than 0). In real terms the idea here is you place and order for $1000 and $750 of invoices come in. I need to calculate that $250 left on the order (remaining) - broken down into each category (cbs_item_id). Then I need the roll-up of all the remaining values grouped by the cbs_item_id.
So for each cbs_item_id I need group by each order, find the total for the order, find the total invoiced against the order then subtract the two (also can't be negative). It has to be on a per order basis - the overall aggregate difference will not return the expected results.
In the end looking for a result something like this:
+-------------+-----------+
| cbs_item_id | remaining |
+-------------+-----------+
| 1 | 68 |
+-------------+-----------+
| 2 | 297 |
+-------------+-----------+
I am guessing this might be a combination of GROUP BY and perhaps a sub query or even CTE (voodoo to me). My SQL skills are not that great and this is WAY above my pay grade.
Here is a fiddle for the data above:
http://sqlfiddle.com/#!17/2fe3a
Alternate fiddle:
https://dbfiddle.uk/?rdbms=postgres_11&fiddle=e9528042874206477efbe0f0e86326fb
This query produces the result you are looking for:
SELECT cbs_item_id, sum(order_total - invoice_total) AS remaining
FROM (
SELECT cbs_item_id
, COALESCE(e.parent_id, e.id) AS expenditure_id -- ①
, COALESCE(sum(total) FILTER (WHERE e.category = 'order' ), 0) AS order_total -- ②
, COALESCE(sum(total) FILTER (WHERE e.category = 'invoice'), 0) AS invoice_total
FROM expenditures e
JOIN expenditure_items i ON i.expenditure_id = e.id
GROUP BY 1, 2 -- ③
) sub
GROUP BY 1
ORDER BY 1;
db<>fiddle here
① Note how I assume a saner table definition with expenditures.parent_id being integer, and true NULL instead of the string 'nil'. This allows the simple use of COALESCE.
② About the aggregate FILTER clause:
Aggregate columns with additional (distinct) filters
③ Using short syntax with ordinal numbers of an SELECT list items. Example:
Select first row in each GROUP BY group?
can I get the total of all the remaining for all rows or do I need to wrap that into another sub select?
There is a very concise option with GROUPING SETS:
...
GROUP BY GROUPING SETS ((1), ()) -- that's all :)
db<>fiddle here
Related:
Converting rows to columns

How do I use a historic value as at a particular month when there are no values for the given month?

I have 2 SQL Server tables.
PurchaseOrderReceivingLine (PORL) is a table that contains every receipt from a purchase order. This has hundreds of entries per month.
PartyRelationshipScore (PRS) is a table with a party (supplier) reference number (that is used to join to the PORL table) and a score out of 10 for relationship and price. It also has a date field for when the score is updated so we have a history of the updates.
What I want to achieve is a supplier summary for each month. So I would have Supplier #, TotalValue, LateParts etc. I'm fine with creating the code for that. What I'm struggling with is getting the score for the given month if there are no values for that month.
So, for example I might have a value of 5 on the 1st August. Then it doesn't change until the 1st October when it is increased to 6.
On the grouping, September will have a TotalValue & a LateParts value but because there are no records in September in the PRS table, it will return a NULL value. I need it to get the last value recorded and return that (in this case August's 5). So it will return;
Aug 2019 - 5
Sep 2019 - 5
Oct 2019 - 6
Thanks in advance.
PORL Table
+-------+----------------+-------+-------+
| PORL# | Date (UK) | Value | Party |
+-------+----------------+-------+-------+
| 1 | 1/8/2019 | 100 | 6 |
| 2 | 1/8/2019 | 250 | 6 |
| 3 | 1/9/2019 | 1000 | 6 |
| 4 | 1/10/2019 | 2000 | 6 |
+-------+----------------+-------+-------+
PRS Table
+-------------+------------+-------------------+------------+
| DateChanged (UK) | Party | RelationShipScore | PriceScore |
+-------------+------------+-------------------+------------+
| 1/8/2019 | 6 | 5 | 5 |
| 1/10/2019 | 6 | 6 | 7 |
+------------------+-------+-------------------+------------+
Preferred outcome
+----------+-------+------+------------+-------------------+------------+
| Supplier | Month | Year | TotalValue | RelationshipScore | PriceScore |
+----------+-------+------+------------+-------------------+------------+
| 6 | 8 | 2019 | 350 | 5 | 5 |
| 6 | 9 | 2019 | 1000 | 5 | 5 |
| 6 | 10 | 2019 | 2000 | 6 | 7 |
+----------+-------+------+------------+-------------------+------------+
The relationshipscore & pricescore for month 9 are based on it not changing from month 8.
I think this helps
select Supplier = T.Party
, Month = DATEPART(MONTH,T.[Date])
, Year = DATEPART(YEAR,T.[Date])
, T.TotalValue
, R.RelationShipScore
, R.PriceScore
from ( Select P.[Party],P.[Date],[TotalValue] = sum(P.[Value])
from PurchaseOrderReceivingLine P
group by P.[Party],P.[Date] ) T
outer apply ( select top 1 RelationShipScore , PriceScore
from PartyRelationshipScore
where Party = T.Party
and DateChanged <= T.[Date]
Order by DateChanged desc ) R

How to exclude rows that have matching fields in other rows

I have a table in MS Access 2013 that has a number of different columns. As part of the data that is entered into the main table, there are duplicates in certain columns. However when I 'pot up' the volumes of rows based on their status, I need to be able to exclude those with the same values in other columns.
------------------------------------------------------------
HeaderID | Date | Number | EffectiveDate | Reg | Status
------------------------------------------------------------
2 | 01/01/2016| 100001 | 01/12/2015 | 01 | Ready
3 | 01/01/2016| 100001 | 01/12/2015 | 02 | Ready
4 | 02/02/2016| 100002 | 12/11/2015 | R | Pending
5 | 02/02/2016| 100002 | 12/11/2015 | T | Pending
6 | 02/02/2016| 100002 | 12/11/2015 | N | Pending
7 | 15/09/2015| 100003 | 30/11/2015 | 01 | Ready
8 | 14/09/2015| 100004 | 20/02/2016 | 01 | New
I have the basic below code already:
Select
tbl_Progression.Status,
Count(tbl_Progression.HeaderID) AS CountofHeaderID
From tbl_Progression
Group By tbl_Progression.Status
I'm looking to be able to get the results to look like the below using the example data above, whereby the Status is counted by HeaderID but only counts once those records that have the same Date, Number and EffectiveDate (but different Reg) to look like this:
------------------------
Status | CountofHeaderID
------------------------
Pending | 1
Ready | 2
New | 1
Instead of what the current code is doing:
------------------------
Status | CountofHeaderID
------------------------
Pending | 3
Ready | 3
New | 1
MS Access doesn't support COUNT(DISTINCT). You can, however, use a subquery with DISTINCT (or GROUP BY):
Select p.Status, Count(*) as new_CountofHeaderID
From (select distinct p.status, p.Date, p.Number, pEffectiveDate
from tbl_Progression as p
) as p
Group By p.Status;

Strange window function behaviour

I have the following set of data:
player | score | day
--------+-------+------------
John | 3 | 02-01-2014
John | 5 | 02-02-2014
John | 7 | 02-03-2014
John | 9 | 02-04-2014
John | 11 | 02-05-2014
John | 13 | 02-06-2014
Mark | 2 | 02-01-2014
Mark | 4 | 02-02-2014
Mark | 6 | 02-03-2014
Mark | 8 | 02-04-2014
Mark | 10 | 02-05-2014
Mark | 12 | 02-06-2014
Given two time ranges:
02-01-2014..02-03-2014
02-04-2014..02-06-2014
I need to get average score for each player within a given time range. Ultimate result I'm trying to achieve is this:
player | period_1_score | period_2_score
--------+----------------+----------------
John | 5 | 11
Mark | 4 | 10
The original algorithm I came up with was:
perform SELECT with two values, derived by partitioning the set of scores into two for each time period
over the first SELECT, perform another one, grouping the set by player name.
I'm stuck on step 1: running the following query:
SELECT
player,
AVG(score) OVER (PARTITION BY day BETWEEN '02-01-2014' AND '02-03-2014') AS period_1,
AVG(score) OVER (PARTITION BY day BETWEEN '02-04-2014' AND '02-06-2014') AS period_2;
Gets me incorrect result (note how period1 and period2 average scores scores are the same:
player | period_1_score | period_2_score
--------+----------------+----------------
John | 5 | 5
John | 5 | 5
John | 5 | 5
John | 5 | 5
John | 5 | 5
John | 5 | 5
Mark | 4 | 4
Mark | 4 | 4
Mark | 4 | 4
Mark | 4 | 4
Mark | 4 | 4
Mark | 4 | 4
I think I don't fully understand how window functions work... I have 2 questions:
What is wrong with my query?
How do I do it right?
You don't need window function for this.
Try:
select
player
,avg(case when day BETWEEN '02-01-2014' AND '02-03-2014' then score else null end) as period_1_score
,avg(case when day BETWEEN '02-04-2014' AND '02-06-2014' then score else null end) as period_1_score
from <your data>
group by player

SQL query to get the same set of results

This should be a simple one, but say I have a table with data like this:
| ID | Date | Value |
| 1 | 01/01/2013 | 40 |
| 2 | 03/01/2013 | 20 |
| 3 | 10/01/2013 | 30 |
| 4 | 14/02/2013 | 60 |
| 5 | 15/03/2013 | 10 |
| 6 | 27/03/2013 | 70 |
| 7 | 01/04/2013 | 60 |
| 8 | 01/06/2013 | 20 |
What I want is the sum of values per week of the year, showing ALL weeks.. (for use in an excel graph)
What my query gives me, is only the weeks that are actually in the database.
With SQL you cannot return rows that don't exist in some table. To get the effect you want you could create a table called WeeksInYear with only one field WeekNumber that is an Int. Populate the table with all the week numbers. Then JOIN that table to this one.
The query would then look something like the following:
SELECT w.WeekNumber, SUM(m.Value)
FROM MyTable as m
RIGHT OUTER JOIN WeeksInYear AS w
ON DATEPART(wk, m.date) = w.WeekNumber
GROUP BY w.WeekNumber
The missing weeks will not have any data in MyTable and show a 0.