How do I use a historic value as at a particular month when there are no values for the given month? - sql

I have 2 SQL Server tables.
PurchaseOrderReceivingLine (PORL) is a table that contains every receipt from a purchase order. This has hundreds of entries per month.
PartyRelationshipScore (PRS) is a table with a party (supplier) reference number (that is used to join to the PORL table) and a score out of 10 for relationship and price. It also has a date field for when the score is updated so we have a history of the updates.
What I want to achieve is a supplier summary for each month. So I would have Supplier #, TotalValue, LateParts etc. I'm fine with creating the code for that. What I'm struggling with is getting the score for the given month if there are no values for that month.
So, for example I might have a value of 5 on the 1st August. Then it doesn't change until the 1st October when it is increased to 6.
On the grouping, September will have a TotalValue & a LateParts value but because there are no records in September in the PRS table, it will return a NULL value. I need it to get the last value recorded and return that (in this case August's 5). So it will return;
Aug 2019 - 5
Sep 2019 - 5
Oct 2019 - 6
Thanks in advance.
PORL Table
+-------+----------------+-------+-------+
| PORL# | Date (UK) | Value | Party |
+-------+----------------+-------+-------+
| 1 | 1/8/2019 | 100 | 6 |
| 2 | 1/8/2019 | 250 | 6 |
| 3 | 1/9/2019 | 1000 | 6 |
| 4 | 1/10/2019 | 2000 | 6 |
+-------+----------------+-------+-------+
PRS Table
+-------------+------------+-------------------+------------+
| DateChanged (UK) | Party | RelationShipScore | PriceScore |
+-------------+------------+-------------------+------------+
| 1/8/2019 | 6 | 5 | 5 |
| 1/10/2019 | 6 | 6 | 7 |
+------------------+-------+-------------------+------------+
Preferred outcome
+----------+-------+------+------------+-------------------+------------+
| Supplier | Month | Year | TotalValue | RelationshipScore | PriceScore |
+----------+-------+------+------------+-------------------+------------+
| 6 | 8 | 2019 | 350 | 5 | 5 |
| 6 | 9 | 2019 | 1000 | 5 | 5 |
| 6 | 10 | 2019 | 2000 | 6 | 7 |
+----------+-------+------+------------+-------------------+------------+
The relationshipscore & pricescore for month 9 are based on it not changing from month 8.

I think this helps
select Supplier = T.Party
, Month = DATEPART(MONTH,T.[Date])
, Year = DATEPART(YEAR,T.[Date])
, T.TotalValue
, R.RelationShipScore
, R.PriceScore
from ( Select P.[Party],P.[Date],[TotalValue] = sum(P.[Value])
from PurchaseOrderReceivingLine P
group by P.[Party],P.[Date] ) T
outer apply ( select top 1 RelationShipScore , PriceScore
from PartyRelationshipScore
where Party = T.Party
and DateChanged <= T.[Date]
Order by DateChanged desc ) R

Related

How to get last value for each user_id (postgreSQL)

Current ratio of user is his last inserted ratio in table "Ratio History"
user_id | year | month | ratio
For example if user with ID 1 has two rows
1 | 2019 | 2 | 10
1 | 2019 | 3 | 15
his ratio is 15.
there is some slice from develop table
user_id | year | month | ratio
1 | 2018 | 7 | 10
2 | 2018 | 8 | 20
3 | 2018 | 8 | 30
1 | 2019 | 1 | 40
2 | 2019 | 2 | 50
3 | 2018 | 10 | 60
2 | 2019 | 3 | 70
I need a query which will select grouped rows by user_id and their last ratio.
As a result of the request, the following entries should be selected
user_id | year | month | ratio
1 | 2019 | 1 | 40
2 | 2019 | 3 | 70
3 | 2018 | 10 | 60
I tried use this query
select rh1.user_id, ratio, rh1.year, rh1.month from ratio_history rh1
join (
select user_id, max(year) as maxYear, max(month) as maxMonth
from ratio_history group by user_id
) rh2 on rh1.user_id = rh2.user_id and rh1.year = rh2.maxYear and rh1.month = rh2.maxMonth
but i got only one row
Use distinct on:
select distinct on (user_id) rh.*
from ratio_history rh
order by user_id, year desc, month desc;
distinct on is a very convenient Postgres extension. It returns one row for the key values in parentheses? Which row, it is the first row based on the sort criteria. Note that the sort criteria need to start with the expressions in parentheses.

Slicing account balance data in BigQuery to generate a debit report

I have a collection of account balances over time:
+-----------------+------------+-------------+-----------------------+
| account_balance | department | customer_id | timestamp |
+-----------------+------------+-------------+-----------------------+
| 5 | A | 1 | 2019-02-12T00:00:00 |
| -10 | A | 1 | 2019-02-13T00:00:00 |
| -35 | A | 1 | 2019-02-14T00:00:00 |
| 20 | A | 1 | 2019-02-15T00:00:00 |
+-----------------+------------+-------------+-----------------------+
Each record shows the total account balance of a customer at a specified timestamp. The account balance increases e.g. to 20 from -35, when a customer tops-up his account with 55. As a customer uses a services, his account balances decreases e.g. from 5 to -10.
I want to aggregate this data in two ways:
1) Get the debit, credit and balance (credit-debit) of a department per month and year. The results from April should be a summary of all previous months:
+---------+--------+-------+------------+-------+--------+
| balance | credit | debit | department | month | year |
+---------+--------+-------+------------+-------+--------+
| 5 | 10 | -5 | A | 1 | 2019 |
| 20 | 32 | -12 | A | 2 | 2019 |
| 35 | 52 | -17 | A | 3 | 2019 |
| 51 | 70 | -19 | A | 4 | 2019 |
+---------+--------+-------+------------+-------+--------+
A customer's account balance might not change every month. There might be account balance records of customer 1 in February, but not March.
Notes towards the solution:
use EXTRACT(MONTH from timestamp) month
use EXTRACT(YEAR from timestamp) year
GROUP BY month, year, department
2) Get the change of debit, credit and balance of a department by date.
+---------+--------+-------+------------+-------------+
| balance | credit | debit | department | date |
+---------+--------+-------+------------+-------------+
| 5 | 10 | -5 | A | 2019-01-15 |
| 15 | 22 | -7 | A | 2019-02-15 |
| 15 | 20 | -5 | A | 2019-03-15 |
| 16 | 18 | -2 | A | 2019-04-15 |
+---------+--------+-------+------------+-------------+
51 70 -19
When I create a SUM of the deltas, I should get the same values as the last row from results in 1).
Notes towards the solution:
use account_balance - LAG(account_balance) OVER(PARTITION BY department ORDER BY timestamp ASC) delta to compute deltas
Your question is unclear, but it sounds like you want to get the outstanding balance at any given point in time.
The following query does this for 1 point in time.
with calendar as (
select cast('2019-06-01' as timestamp) as balance_calc_ts
),
most_recent_balance as (
select customer_id, balance_calc_ts,max(timestamp) as most_recent_balance_ts
from <table>
cross join calendar
where timestamp < balance_calc_ts -- or <=
group by 1,2
)
select t.customer_id, t.account_balance, mrb.balance_calc_ts
from <table> t
inner join most_recent_balance mrb on t.customer_id = mrb.customer_id and t.timestamp = mrb.balance_calc_ts
If you need to calculate it at a series of points in time, you will need to modify the calendar CTE to return more dates. This is the beauty of CROSS JOINS in BQ!

How to subtract previous value in a column with calculation of other column on SQL server

I have a requirement for a table as shown below. As you can see mgt_year,tot_dflt_mgt and to_accum_mgt columns. In year column where its 2016 the value is 20 and accum value is 600. What I want is that when I do
(to_accum_mgt - tot_dflt_mgt)
I want this calculated result in previous row as shown in the table below. Then this calculated result i.e. 580 is used for subtracting 9 like (580 - 9) for year 2015 and so on for all trailing years. I have done this in excel and also in Oracle thanks to #mathguy, but how to achieve this result in SQL server. I have tried to use this SQL server but its not working.
Please forgive My bad English and noob formatting.
My table t:
line_seg MGT_YEAR TOT_DFLT_MGT TOT_ACCUM_MGT
--------- -------- ------------ ------------
A 2013 10
A 2014 15
A 2015 9
A 2016 20 600
B 2013 10
B 2014 15
B 2015 8
B 2016 20 500
Oracle Solution:
select mgt_year, tot_dflt_mgt,
max(tot_accum_mgt) over () -
nvl( sum(tot_dflt_mgt) over
(order by mgt_year
rows between 1 following and unbounded following)
, 0 ) as tot_accum_mgt
from t;
but I am unable use this in SQL Server.
required output
line_seg MGT_YEAR TOT_DFLT_MGT TOT_ACCUM_MGT
--------- -------- ------------ ------------
A 2013 10 556
A 2014 15 471
A 2015 9 580
A 2016 20 600
B 2013 12 457
B 2014 15 472
B 2015 8 480
B 2016 20 500
select *,
(sum(TOT_ACCUM_MGT) over()) -
(sum(TOT_DFLT_MGT ) over (order by TOT_DFLT_MGT )) as somecolname
from
table
Put Row_number() and self join it with the previous row on (a.ID = b.ID) and (a.row_num = b.row_num - 1)
OR
You can use lag() function
Please try the following query. I assumed that you are using 2012+ version of SQL Server. If not, please change the FIRST_VALUE to SUM -
SELECT t1.line_seg, t1.mgt_year, t1.[tot_dflt_mgt]
, FIRST_VALUE(t1.tot_accum_mgt) OVER(PARTITION BY t1.[line_seg] ORDER BY t1.mgt_year DESC)
- ISNULL(SUM(t2.[tot_dflt_mgt]) OVER(PARTITION BY t2.[line_seg] ORDER BY t2.mgt_year DESC), 0) AS tot_accum_mgt
FROM [dbo].[t] AS t1
LEFT JOIN [dbo].[t] AS t2 ON (t2.line_seg = t1.line_seg AND t2.mgt_year = t1.mgt_year + 1)
ORDER BY t1.line_seg, t1.mgt_year ASC;
To do this first I have to imagine the table as sorted by the descending order of date -
+------------+----------+--------------+---------------+
| line_seg | mgt_year | tot_dflt_mgt | tot_accum_mgt |
+------------+----------+--------------+---------------+
| A | 2016 | 20 | 600 |
| A | 2015 | 9 | NULL |
| A | 2014 | 15 | NULL |
| A | 2013 | 10 | NULL |
| B | 2016 | 20 | 500 |
| B | 2015 | 8 | NULL |
| B | 2014 | 15 | NULL |
| B | 2013 | 12 | NULL |
+------------+----------+--------------+---------------+
Then all I have to do is to subtract the PREVIOUS running total of tot_dflt_mgt from the latest year's tot_accum_mgt. This is equivalent to subtract the previous tot_dflt_mgt from the current computed value of tot_accum_mgt To use the previous year's fields LEFT JOIN is used to self join the table. Resulting in the following table -
+------------+----------+--------------+---------------+------------+----------+--------------+---------------+
| line_seg | mgt_year | tot_dflt_mgt | tot_accum_mgt | line_seg | mgt_year | tot_dflt_mgt | tot_accum_mgt |
+------------+----------+--------------+---------------+------------+----------+--------------+---------------+
| A | 2013 | 10 | NULL | A | 2014 | 15 | NULL |
| A | 2014 | 15 | NULL | A | 2015 | 9 | NULL |
| A | 2015 | 9 | NULL | A | 2016 | 20 | 600 |
| A | 2016 | 20 | 600 | NULL | NULL | NULL | NULL |
| B | 2013 | 12 | NULL | B | 2014 | 15 | NULL |
| B | 2014 | 15 | NULL | B | 2015 | 8 | NULL |
| B | 2015 | 8 | NULL | B | 2016 | 20 | 500 |
| B | 2016 | 20 | 500 | NULL | NULL | NULL | NULL |
+------------+----------+--------------+---------------+------------+----------+--------------+---------------+
The AND t2.mgt_year = t1.mgt_year + 1 filter in the LEFT join clause does the trick of getting previous rows value. Now all I had to do is to calculate the running total on this previous rows (t2). Also as, subtracting NULL from anything will result in NULL. So ISNULL replaces any NULL with zeros.
ISNULL(SUM(t2.[tot_dflt_mgt]) OVER(PARTITION BY t2.[line_seg] ORDER BY t2.mgt_year DESC), 0) AS tot_accum_mgt
Now, as we have the previous running total of tot_dflt_mgt, all we have to do is to delete the latest (largest mgt_year) tot_accum_mgt. We get that by using FIRST_VALUE function. SUM could also be used instead I guess.
FIRST_VALUE(t1.tot_accum_mgt) OVER(PARTITION BY t1.[line_seg] ORDER BY t1.mgt_year DESC)

SQL - Adding an avg column to a detail table

I'm on Teradata. I have an order table like the below.
custID | orderID | month | order_amount
-----------------------------------------
1 | 1 | jan | 10
1 | 2 | jan | 20
1 | 3 | feb | 5
1 | 4 | feb | 7
2 | 5 | mar | 20
2 | 6 | apr | 30
I'd like to add a column to the above table called "Avg order amount per month per customer". Since the table is at an order level, adding this column will cause duplicates like the below, which is ok.
custID | orderID | month | order_amount | avgOrdAmtperMonth
-------------------------------------------------------------
1 | 1 | jan | 10 | 15
1 | 2 | jan | 20 | 15
1 | 3 | feb | 5 | 6
1 | 4 | feb | 7 | 6
2 | 5 | mar | 20 | 20
2 | 6 | apr | 30 | 30
I want the output to have all the columns above, not just the custid and the new column. I'm not sure how to write this because one part of the table is an at order level and the new column needs to be grouped by customer+month. How would I do this?
This is a simple group average:
AVG(order_amount) OVER (PARTITION BY custID, month)
Why not just do the calculation when you query the table?
select t.*,
avg(order_amount) over (partition by custId, month) as avgOrderAmtPerMonth
from t;
You can add this into a view if you want to make it available to multiple downstream queries.
Actually adding the column to the table is a maintenance "nightmare". You have to add triggers to the table and update the value for updates, inserts, and deletes.

Distinct lists on dates where an ID is present (i.e. intersects) on consecutive dates

I'm trying to make an MSSQL query that produces lists of apartment prices. The ultimate goal of the query is to calculate the percentage change in average prices of apartments. However, this final calculation (namely taking averages) is something I can fix in code provided that the list(s) of prices that are retrieved are correct.
What makes this tricky is that apartments are sold and new ones added all the time, so when comparing prices from week to week (I have weekly data), I only want to compare prices for apartments that have a recorded price in weeks (t-1, t), (t, t+1), (t+1,t+2) etc. In other words, some apartments that had a recorded price in time (t-1) might not be there at time t, and some apartments may have been added at time t (and thus weren't there at time t-1). I only want to select prices in week t-1 and t where some ApartmentID exists in both week t-1 and t to calculate the average change in week t.
Example data
-------------------------------------------------------------
| RegistrationID | Date | Price | ApartmentID |
-------------------------------------------------------------
| 1 | 2014-04-04 | 5 | 1 |
| 2 | 2014-04-04 | 6 | 2 |
| 3 | 2014-04-04 | 4 | 3 |
| 4 | 2014-04-11 | 5.2 | 1 |
| 5 | 2014-04-11 | 4 | 3 |
| 6 | 2014-04-11 | 7 | 4 |
| 7 | 2014-04-19 | 5.1 | 1 |
| 8 | 2014-04-19 | 4.1 | 3 |
| 9 | 2014-04-19 | 7.1 | 4 |
| 10 | 2014-04-26 | 4.1 | 3 |
| 11 | 2014-04-26 | 7.2 | 4 |
-------------------------------------------------------------
Solutions thoughts
I think it makes sense to produce two different lists, one for odd-numbered weeks and one for even-numbered weeks. List 1 would then contain Date, Price and ApartmentID that are valid for the tuples (t-1,t), (t+1,t+2), (t+3,t+4) etc. while list 2 would contain the same for the tuples (t,t+1),(t+2,t+3),(t+4,t+5) etc. The reason I think two lists are needed is that for any given week t, there are two sets of apartments and corresponding prices that need to be produced - one that is "forward compatible" and one that is "backwards compatible".
If two such lists can be produced, then the rest is simply an exercise in taking averages over each distinct date.
I'm not really sure to begin here. I played a little around with Intersect, but I'm pretty sure I need to nest queries to get this to work.
Result
Using the methodology described above would yield two lists.
List 1:
Notice how RegistrationID 2 and 6 disappear because they don't exist in on both dates 2014-04-04 and 2014-04-11. The same goes for RegistrationID 7 as this apartment doesn't exist for both 2014-04-19 and 2014-04-26.
-------------------------------------------------------------
| RegistrationID | Date | Price | ApartmentID |
-------------------------------------------------------------
| 1 | 2014-04-04 | 5 | 1 |
| 3 | 2014-04-04 | 4 | 3 |
| 4 | 2014-04-11 | 5.2 | 1 |
| 5 | 2014-04-11 | 4 | 3 |
| 8 | 2014-04-19 | 4.1 | 3 |
| 9 | 2014-04-19 | 7.1 | 4 |
| 10 | 2014-04-26 | 4.1 | 3 |
| 11 | 2014-04-26 | 7.2 | 4 |
-------------------------------------------------------------
List 2:
Here, nothing disappears because every apartment is present in the tuples within the scope of this list.
-------------------------------------------------------------
| RegistrationID | Date | Price | ApartmentID |
-------------------------------------------------------------
| 4 | 2014-04-11 | 5.2 | 1 |
| 5 | 2014-04-11 | 4 | 3 |
| 6 | 2014-04-11 | 7 | 4 |
| 7 | 2014-04-19 | 5.1 | 1 |
| 8 | 2014-04-19 | 4.1 | 3 |
| 9 | 2014-04-19 | 7.1 | 4 |
-------------------------------------------------------------
Here's a solution. First, I get all the records from the table (I named it "ApartmentPrice"), computing the WeekOf (which is the Sunday of that week), PreviousWeek (the Sunday of the previous week), and NextWeek (the Sunday of the following week). I store that in a table variable (you could also put it in a CTE or a temp table).
declare #tempTable table(RegistrationId int, PriceDate date, Price decimal(8,2), ApartmentId int, WeekOf date, PreviousWeek date, NextWeek date)
Insert #tempTable
select ap.RegistrationId,
ap.PriceDate,
ap.Price,
ap.ApartmentId,
DATEADD(ww, DATEDIFF(ww,0,ap.PriceDate), 0) WeekOf,
DATEADD(ww, DATEDIFF(ww,0,dateadd(wk, -1, ap.PriceDate)), 0) PreviousWeek,
DATEADD(ww, DATEDIFF(ww,0,dateadd(wk, 1, ap.PriceDate)), 0) NextWeek
from ApartmentPrice ap
Then I join that table variable to itself where WeekOf equals either NextWeek or PreviousWeek. This gives the apartments that have a record in the adjoining week.
select distinct t.RegistrationId, t.PriceDate, t.Price, t.ApartmentId
from #tempTable t
join #tempTable t2 on t.ApartmentId = t2.ApartmentId and (t.WeekOf = t2.PreviousWeek or t.WeekOf = t2.NextWeek)
order by t.RegistrationId, t.ApartmentId, t.PriceDate
I'm using distinct because an apartment will appear more than once in the results if it does have an adjoining week record.
You can also find the average prices for each week like this:
select t.WeekOf, avg(distinct t.Price)
from #tempTable t
join #tempTable t2 on t.ApartmentId = t2.ApartmentId and (t.WeekOf = t2.PreviousWeek or t.WeekOf = t2.NextWeek)
group by t.WeekOf
order by t.WeekOf
Here's a SQL Fiddle. I added a few more rows to the test data to show that it handles dates that cross the end of the year boundary.