SQL - Adding an avg column to a detail table - sql

I'm on Teradata. I have an order table like the below.
custID | orderID | month | order_amount
-----------------------------------------
1 | 1 | jan | 10
1 | 2 | jan | 20
1 | 3 | feb | 5
1 | 4 | feb | 7
2 | 5 | mar | 20
2 | 6 | apr | 30
I'd like to add a column to the above table called "Avg order amount per month per customer". Since the table is at an order level, adding this column will cause duplicates like the below, which is ok.
custID | orderID | month | order_amount | avgOrdAmtperMonth
-------------------------------------------------------------
1 | 1 | jan | 10 | 15
1 | 2 | jan | 20 | 15
1 | 3 | feb | 5 | 6
1 | 4 | feb | 7 | 6
2 | 5 | mar | 20 | 20
2 | 6 | apr | 30 | 30
I want the output to have all the columns above, not just the custid and the new column. I'm not sure how to write this because one part of the table is an at order level and the new column needs to be grouped by customer+month. How would I do this?

This is a simple group average:
AVG(order_amount) OVER (PARTITION BY custID, month)

Why not just do the calculation when you query the table?
select t.*,
avg(order_amount) over (partition by custId, month) as avgOrderAmtPerMonth
from t;
You can add this into a view if you want to make it available to multiple downstream queries.
Actually adding the column to the table is a maintenance "nightmare". You have to add triggers to the table and update the value for updates, inserts, and deletes.

Related

How do I use a historic value as at a particular month when there are no values for the given month?

I have 2 SQL Server tables.
PurchaseOrderReceivingLine (PORL) is a table that contains every receipt from a purchase order. This has hundreds of entries per month.
PartyRelationshipScore (PRS) is a table with a party (supplier) reference number (that is used to join to the PORL table) and a score out of 10 for relationship and price. It also has a date field for when the score is updated so we have a history of the updates.
What I want to achieve is a supplier summary for each month. So I would have Supplier #, TotalValue, LateParts etc. I'm fine with creating the code for that. What I'm struggling with is getting the score for the given month if there are no values for that month.
So, for example I might have a value of 5 on the 1st August. Then it doesn't change until the 1st October when it is increased to 6.
On the grouping, September will have a TotalValue & a LateParts value but because there are no records in September in the PRS table, it will return a NULL value. I need it to get the last value recorded and return that (in this case August's 5). So it will return;
Aug 2019 - 5
Sep 2019 - 5
Oct 2019 - 6
Thanks in advance.
PORL Table
+-------+----------------+-------+-------+
| PORL# | Date (UK) | Value | Party |
+-------+----------------+-------+-------+
| 1 | 1/8/2019 | 100 | 6 |
| 2 | 1/8/2019 | 250 | 6 |
| 3 | 1/9/2019 | 1000 | 6 |
| 4 | 1/10/2019 | 2000 | 6 |
+-------+----------------+-------+-------+
PRS Table
+-------------+------------+-------------------+------------+
| DateChanged (UK) | Party | RelationShipScore | PriceScore |
+-------------+------------+-------------------+------------+
| 1/8/2019 | 6 | 5 | 5 |
| 1/10/2019 | 6 | 6 | 7 |
+------------------+-------+-------------------+------------+
Preferred outcome
+----------+-------+------+------------+-------------------+------------+
| Supplier | Month | Year | TotalValue | RelationshipScore | PriceScore |
+----------+-------+------+------------+-------------------+------------+
| 6 | 8 | 2019 | 350 | 5 | 5 |
| 6 | 9 | 2019 | 1000 | 5 | 5 |
| 6 | 10 | 2019 | 2000 | 6 | 7 |
+----------+-------+------+------------+-------------------+------------+
The relationshipscore & pricescore for month 9 are based on it not changing from month 8.
I think this helps
select Supplier = T.Party
, Month = DATEPART(MONTH,T.[Date])
, Year = DATEPART(YEAR,T.[Date])
, T.TotalValue
, R.RelationShipScore
, R.PriceScore
from ( Select P.[Party],P.[Date],[TotalValue] = sum(P.[Value])
from PurchaseOrderReceivingLine P
group by P.[Party],P.[Date] ) T
outer apply ( select top 1 RelationShipScore , PriceScore
from PartyRelationshipScore
where Party = T.Party
and DateChanged <= T.[Date]
Order by DateChanged desc ) R

How to get last value for each user_id (postgreSQL)

Current ratio of user is his last inserted ratio in table "Ratio History"
user_id | year | month | ratio
For example if user with ID 1 has two rows
1 | 2019 | 2 | 10
1 | 2019 | 3 | 15
his ratio is 15.
there is some slice from develop table
user_id | year | month | ratio
1 | 2018 | 7 | 10
2 | 2018 | 8 | 20
3 | 2018 | 8 | 30
1 | 2019 | 1 | 40
2 | 2019 | 2 | 50
3 | 2018 | 10 | 60
2 | 2019 | 3 | 70
I need a query which will select grouped rows by user_id and their last ratio.
As a result of the request, the following entries should be selected
user_id | year | month | ratio
1 | 2019 | 1 | 40
2 | 2019 | 3 | 70
3 | 2018 | 10 | 60
I tried use this query
select rh1.user_id, ratio, rh1.year, rh1.month from ratio_history rh1
join (
select user_id, max(year) as maxYear, max(month) as maxMonth
from ratio_history group by user_id
) rh2 on rh1.user_id = rh2.user_id and rh1.year = rh2.maxYear and rh1.month = rh2.maxMonth
but i got only one row
Use distinct on:
select distinct on (user_id) rh.*
from ratio_history rh
order by user_id, year desc, month desc;
distinct on is a very convenient Postgres extension. It returns one row for the key values in parentheses? Which row, it is the first row based on the sort criteria. Note that the sort criteria need to start with the expressions in parentheses.

How to subtract previous value in a column with calculation of other column on SQL server

I have a requirement for a table as shown below. As you can see mgt_year,tot_dflt_mgt and to_accum_mgt columns. In year column where its 2016 the value is 20 and accum value is 600. What I want is that when I do
(to_accum_mgt - tot_dflt_mgt)
I want this calculated result in previous row as shown in the table below. Then this calculated result i.e. 580 is used for subtracting 9 like (580 - 9) for year 2015 and so on for all trailing years. I have done this in excel and also in Oracle thanks to #mathguy, but how to achieve this result in SQL server. I have tried to use this SQL server but its not working.
Please forgive My bad English and noob formatting.
My table t:
line_seg MGT_YEAR TOT_DFLT_MGT TOT_ACCUM_MGT
--------- -------- ------------ ------------
A 2013 10
A 2014 15
A 2015 9
A 2016 20 600
B 2013 10
B 2014 15
B 2015 8
B 2016 20 500
Oracle Solution:
select mgt_year, tot_dflt_mgt,
max(tot_accum_mgt) over () -
nvl( sum(tot_dflt_mgt) over
(order by mgt_year
rows between 1 following and unbounded following)
, 0 ) as tot_accum_mgt
from t;
but I am unable use this in SQL Server.
required output
line_seg MGT_YEAR TOT_DFLT_MGT TOT_ACCUM_MGT
--------- -------- ------------ ------------
A 2013 10 556
A 2014 15 471
A 2015 9 580
A 2016 20 600
B 2013 12 457
B 2014 15 472
B 2015 8 480
B 2016 20 500
select *,
(sum(TOT_ACCUM_MGT) over()) -
(sum(TOT_DFLT_MGT ) over (order by TOT_DFLT_MGT )) as somecolname
from
table
Put Row_number() and self join it with the previous row on (a.ID = b.ID) and (a.row_num = b.row_num - 1)
OR
You can use lag() function
Please try the following query. I assumed that you are using 2012+ version of SQL Server. If not, please change the FIRST_VALUE to SUM -
SELECT t1.line_seg, t1.mgt_year, t1.[tot_dflt_mgt]
, FIRST_VALUE(t1.tot_accum_mgt) OVER(PARTITION BY t1.[line_seg] ORDER BY t1.mgt_year DESC)
- ISNULL(SUM(t2.[tot_dflt_mgt]) OVER(PARTITION BY t2.[line_seg] ORDER BY t2.mgt_year DESC), 0) AS tot_accum_mgt
FROM [dbo].[t] AS t1
LEFT JOIN [dbo].[t] AS t2 ON (t2.line_seg = t1.line_seg AND t2.mgt_year = t1.mgt_year + 1)
ORDER BY t1.line_seg, t1.mgt_year ASC;
To do this first I have to imagine the table as sorted by the descending order of date -
+------------+----------+--------------+---------------+
| line_seg | mgt_year | tot_dflt_mgt | tot_accum_mgt |
+------------+----------+--------------+---------------+
| A | 2016 | 20 | 600 |
| A | 2015 | 9 | NULL |
| A | 2014 | 15 | NULL |
| A | 2013 | 10 | NULL |
| B | 2016 | 20 | 500 |
| B | 2015 | 8 | NULL |
| B | 2014 | 15 | NULL |
| B | 2013 | 12 | NULL |
+------------+----------+--------------+---------------+
Then all I have to do is to subtract the PREVIOUS running total of tot_dflt_mgt from the latest year's tot_accum_mgt. This is equivalent to subtract the previous tot_dflt_mgt from the current computed value of tot_accum_mgt To use the previous year's fields LEFT JOIN is used to self join the table. Resulting in the following table -
+------------+----------+--------------+---------------+------------+----------+--------------+---------------+
| line_seg | mgt_year | tot_dflt_mgt | tot_accum_mgt | line_seg | mgt_year | tot_dflt_mgt | tot_accum_mgt |
+------------+----------+--------------+---------------+------------+----------+--------------+---------------+
| A | 2013 | 10 | NULL | A | 2014 | 15 | NULL |
| A | 2014 | 15 | NULL | A | 2015 | 9 | NULL |
| A | 2015 | 9 | NULL | A | 2016 | 20 | 600 |
| A | 2016 | 20 | 600 | NULL | NULL | NULL | NULL |
| B | 2013 | 12 | NULL | B | 2014 | 15 | NULL |
| B | 2014 | 15 | NULL | B | 2015 | 8 | NULL |
| B | 2015 | 8 | NULL | B | 2016 | 20 | 500 |
| B | 2016 | 20 | 500 | NULL | NULL | NULL | NULL |
+------------+----------+--------------+---------------+------------+----------+--------------+---------------+
The AND t2.mgt_year = t1.mgt_year + 1 filter in the LEFT join clause does the trick of getting previous rows value. Now all I had to do is to calculate the running total on this previous rows (t2). Also as, subtracting NULL from anything will result in NULL. So ISNULL replaces any NULL with zeros.
ISNULL(SUM(t2.[tot_dflt_mgt]) OVER(PARTITION BY t2.[line_seg] ORDER BY t2.mgt_year DESC), 0) AS tot_accum_mgt
Now, as we have the previous running total of tot_dflt_mgt, all we have to do is to delete the latest (largest mgt_year) tot_accum_mgt. We get that by using FIRST_VALUE function. SUM could also be used instead I guess.
FIRST_VALUE(t1.tot_accum_mgt) OVER(PARTITION BY t1.[line_seg] ORDER BY t1.mgt_year DESC)

How do I do multiple selection based on a flowchart of criteria?

Table name: Copies
+------------------------------------------------------------------------------------+
| group_id | my_id | previous | in_this | higher_value | most_recent |
+----------------------------------------------------------------------------------------------------------------
| 900 | 1 | null | Y | 7 | May16 |
| 900 | 2 | null | Y | 3 | Oct 16 |
| 900 | 3 | null | N | 9 | Oct 16 |
| 901 | 4 | 378 | Y | 3 | Oct 16 |
| 901 | 5 | null | N | 2 | Oct 16 |
| 902 | 6 | null | N | 5 | May16 |
| 902 | 7 | null | N | 9 | Oct 16 |
| 903 | 8 | null | Y | 3 | Oct 16 |
| 903 | 9 | null | Y | 3 | May16 |
| 904 | 10 | null | N | 0 | May 16 |
| 904 | 11 | null | N | 0 | May16
--------------------------------------------------------------------------------------
Output table
+---------------------------------------------------------------------------------------------------+
| group_id | my_id | previous | in_this | higher_value |most_recent|
+----------------------------------------------------------------------------------------------------
| 900 | 1 | null | Y | 7 | May16 |
| 902 | 7 | null | N | 9 | Oct 16 |
| 903 | 8 | null | Y | 3 | Oct 16 |
---------------------------------------------------------------------------------------------------------
Hi all, I need help with a query that returns one record within a group based on the importance of the field. The importance is ranked as follows:
previous- if one record within the group_id is not null, then neither record within a group_id is returned (because according to our rules, all records within a group should have the same previous value)
in_this- If one record is Y, and the other is N within a group_id, then we keep the Y; If all records are Y or all are N, then we move to the next attribute
Higher_value- If all records in the ‘in_this’ field are equal, then we need to select the record with the greater value from this field. If both records have an equal value, we move to the next attribute
Most_recent- If all records were of equal value in the ‘higher_value’ field, then we consider the newest record. If these are equal, then nothing is returned.
This is a simplified version of the table I am looking at, but I just would like to get the gist of how something like this would work. Basically, my table has multiple copies of records that have been grouped through some algorithm. I have been tasked with selecting which of these records within a group is the ‘good’ one, and we are basing this on these fields.
I’d like the output to actually show all fields, because I will likely attempt to refine the query to include other fields (there are over 40 to consider), but the most important is the group_id and my_id fields. It would be neat if we could also somehow flag why each record got picked, but that isn’t necessary.
It seems like something like this should be easy, but I have a hard time wrapping my head around how to pick from within a group_id. Thanks for your help.
You can use analytic functions for this. The trick is establishing the right variables for each condition:
select t.*
from (select t.*,
max(in_this) over (partition by group_id) as max_in_this,
min(higher_value) over (partition by group_id) as min_higher_value,
max(higher_value) over (partition by group_id) as max_higher_value,
row_number() over (partition by group_id, higher_value order by my_id) as seqnum_ghv,
min(most_recent) over (partition by group_id) as min_most_recent,
max(most_recent) over (partition by group_id) as max_most_recent,
row_number() over (partition by group_id order by most_recent) as seqnum_mr
from t
) t
where max_in_this is not null and
( (min_higher_value <> max_higher_value and seqnum_ghv = 1) or
(min_higher_value = max_higher_value and min_most_recent <> max_most_recent and seqnum_mr = 1
)
);
The third condition as stated makes no sense, but you should get the idea for how to implement this.

A single query to count the number of distinct rows in one table and the highest value of a column from another table

I have two SQL tables. Table 1 is as follows:
SALEREF
1 | 40303020
2 | 40303021
3 | 40303021
4 | 40303021
5 | 41210028
6 | 4120302701
7 | 41210030
8 | 4112700803
9 | 4112700803
10 | 41215030
11 | 41215026
12 | 41215026
13 | 41215026
14 | 41215026
15 | 41215026
16 | 41215026
17 | 41215026
18 | 41215027
19 | 41215027
20 | 41215027
Table 2 ("LEDGER") is as follows:
SALESREF SALEDATE
0 | 4081200201 | 20140804
1 | 40303020 | 20141015
2 | 40303021 | 20141017
3 | 40303021 | 20141017
4 | 40303021 | 20141017
5 | 41210028 | 20121214
6 | 4120302701 | 20130926
7 | 41210030 | 20130926
8 | 4112700803 | 20131107
9 | 4112700803 | 20131107
10 | 41215030 | 20120720
What I am looking for is a single line that outputs the following:
TotalDistinctSalesRefsInTable1 HighestSaleDateValueInTable2 (that has a matching value in table 1)
9 20141017
the total number of distinct SALESREF's in table 1 and the latest SALESDATE value from table 2.
I've tried selecting within a query but quickly found the limitation of my knowledge although I know I can get the latest overall sale date by doing:
SELECT MAX(LEDGER.SALEDATE) AS LAST_DATE FROM LEDGER
I just need help piecing the whole thing together.
you can use left join , count and max to get your desired result
select count(distinct t1.salesref) as TotalDistinctSalesRefsInTable1,
ifnull(max(l.saledate),0) as HighestSaleDateValueInTable
from table1 t1
left join ledger l
on t1.salesref = l.salesref