SQL script to partition data on a column and return the max value [duplicate]

SQL script to partition data on a column and return the max value [duplicate] - sql

This question already has answers here:
How to group by on consecutive values in SQL
(2 answers)
Closed 6 years ago.
I have a requirement to compute bonus payout based on spread goal and date achieved as follows:
Spread Goal | Date Achieved | Bonus Payout
----------------------------------------------
$3,500 | < 27 wks | $2,000
$3,500 | 27 wks to 34 wks | $1,000
$3,500 | > 34 wks | $0
I have a table in SQL Server 2014 where the subset of the data is as follows:
EMP_ID WK_NUM NET_SPRD_LCL
123 10 0
123 11 1500
123 15 3600
123 18 3800
123 19 4000
Based on the requirement, I need to look for records where NET_SPRD_LCL is greater than or equal to 3500 during 2 continuous wk_num.
So, in my example, WK_NUM 15 and 18 (which in my case are continuous because I have a calendar table that I join to to exclude the holiday weeks) are less than 27 wks and have NET_SPRD_LCL > 3500.
For this case, I want to output the MAX(WK_NUM), it's associated NET_SPRD_LCL and BONUSPAYOUT = 2000. So, the output should be as follows:
EMP_ID WK_NUM NET_SPRD_LCL BONUSPAYOUT
123 18 3800 2000
If this meets the first requirement, the script should output and quit. If not, then I will look for the second requirement where Date Achieved is between 27 wks to 34 wks.
I hope I was able to explain my requirement clearly :-)
Thanks for the help.

Nice question! I broke my mind on situations like 4 rows in a turn are with 3500 and more. And came up with this.
You can use CTE, recursive CTE and ROW_NUMBER():
;WITH cte AS(
SELECT EMP_ID,
WK_NUM,
NET_SPRD_LCL,
ROW_NUMBER() OVER (PARTITION BY EMP_ID ORDER BY WK_NUM) rn
FROM YourTable
)
, recur AS (
SELECT EMP_ID,
WK_NUM,
NET_SPRD_LCL,
rn,
1 as lev
FROM cte
WHERE rn = 1
UNION ALL
SELECT c.EMP_ID,
c.WK_NUM,
c.NET_SPRD_LCL,
c.rn,
CASE WHEN c.NET_SPRD_LCL < 3500 THEN Lev+1 ELSE Lev END
FROM cte c
INNER JOIN recur r
ON r.rn+1 = c.rn
)
SELECT TOP 1 WITH TIES
EMP_ID,
WK_NUM,
NET_SPRD_LCL,
CASE WHEN WK_NUM < 27 THEN $2000
WHEN WK_NUM between 27 and 34 THEN $1000
ELSE $0 END as Bonus
FROM recur
WHERE NET_SPRD_LCL >= 3500
ORDER BY ROW_NUMBER() OVER(PARTITION BY EMP_ID,lev ORDER BY WK_NUM)%2
Output for data you provided:
EMP_ID WK_NUM NET_SPRD_LCL Bonus
123 18 3800 2000,00

Related

Frequency of Address changes in number of days SQL

Hi I'm trying to find out how frequently a business would change their address. I've got two tables one with trading address and the other with office address. The complicated part is one id will have several sequence numbers. I need to find out the difference between one address's create date and another address create date.
Trading address table
ID
Create_date
Seq_no
Address
1
2002-03-23
1
20 bottle way
1
2002-05-23
2
12 sunset blvd
2
2003-01-14
1
76 moonrise ct
Office address table
ID
Create_date
Seq_no
Address
1
2004-02-13
1
12 paper st
2
2005-03-01
1
30 pencil way
2
2005-04-01
2
25 mouse rd
2
2005-08-01
3
89 glass cct
My result set will be
Difference
NumberOfID's
30 days
1
60 days
1
120 days
1
Other
2

I think I solved it. Steps are
I did an union and created a separate column to find out actual
sequence no for the union set.
Used LEAD function to create a separate column of to bring up the date.
Date difference to find out the actual difference between id's
Case statement to categorize the days and counting the id's
WITH BASE AS (
SELECT ID,SEQ_NO,CREATE_DATE
FROM TradingAddress
UNION ALL
SELECT ID,SEQ_NO,CREATE_DATE
FROM OfficeAddress
),
WORKINGS AS (
SELECT ID,CREATE_DATE,
DENSE_RANK() OVER (PARTITION BY ID ORDER BY CREATE_DATE ASC) AS SNO,
LEAD(CREATE_DATE) OVER (PARTITION BY ID ORDER BY CREATE_DATE) AS REF_DATE,
DATEDIFF(DAY,CREATE_DATE,LEAD(CREATE_DATE) OVER (PARTITION BY ID ORDER BY CREATE_DATE)) AS DATE_DIFFERENCE
FROM BASE
),
WORKINGS_2 AS (
SELECT *,
CASE WHEN DATE_DIFFERENCE BETWEEN 1 AND 30 THEN '1-30 DAYS'
WHEN DATE_DIFFERENCE BETWEEN 31 AND 60 THEN '31-60 DAYS'
WHEN DATE_DIFFERENCE BETWEEN 61 AND 90 THEN '61-90 DAYS'
WHEN DATE_DIFFERENCE BETWEEN 91 AND 120 THEN '91-120 DAYS'ELSE 'MORE THAN 120 DAYS'
END AS DIFFERENCE_DAYS
FROM WORKINGS
WHERE REF_DATE IS NOT NULL
)
SELECT DIFFERENCE_DAYS,COUNT(DIFFERENCE_DAYS) AS NUMBEROFIDS
FROM WORKINGS_2
GROUP BY DIFFERENCE_DAYS

you can do this in this way
SELECT DATEDIFF(day,t1.create_date,t2.create_date) AS 'yourdats', Count (*) as ids FROM test1 t1 join test2 t2 on t1.id = t2.id GROUP BY DATEDIFF(day,t1.create_date,t2.create_date)

Vertica SQL for running count distinct and running conditional count

I'm trying to build a department level score table based on a deeper product url level score table.
Date is not consecutive
Not all urls got score updates at same day (independent to each other)
dist_url should be running count distinct (cumulative count distinct)
dist urls and urls score >=30 are both count distinct
What I have now is:
Date url Store Dept Page Score
10/1 a US A X 10
10/1 b US A X 30
10/1 c US A X 60
10/4 a US A X 20
10/4 d US A X 60
10/6 b US A X 22
10/9 a US A X 40
10/9 e US A X 10
Date Store Dept Page dist urls urls score >=30
10/1 US A X 3 2
10/4 US A X 4 3
10/6 US A X 4 2
10/9 US A X 5 2
I think the dist_url can be done by using window function, just not sure on query.
Current query is as below, but it's wrong since not cumulative count distinct:
SELECT
bm.AnalysisDate,
su.SoID AS Store,
su.DptCaID AS DTID,
su.PageTypeID AS PTID,
COUNT(DISTINCT bm.SeoURLID) AS NumURLsWithDupScore,
SUM(CASE WHEN bm.DuplicationScore > 30 THEN 1 ELSE 0 END) AS Over30Count
FROM csn_seo.tblBotifyMetrics bm
INNER JOIN csn_seo.tblSEOURLs su
ON bm.SeoURLID = su.ID
WHERE su.DptCaID IS NOT NULL
AND su.DptCaID <> 0
AND su.PageTypeID IS NOT NULL
AND su.PageTypeID <> -1
AND bm.iscompliant = 1
GROUP BY bm.AnalysisDate, su.SoID, su.DptCaID, su.PageTypeID;
Please let me know if anyone has any idea.

Based on your question, you seem to want two levels of logic:
select date, store, dept,
sum(sum(start)) over (partition by dept, page order by date) as distinct_urls,
sum(sum(start_30)) over (partition by dept, page order by date) as distinct_urls_30
from ((select store, dept, page, url, min(date) as date, 1 as start, 0 as start_30
from t
group by store, dept, page, url
) union all
(select store, dept, page, url, min(date) as date, 0, 1
from t
where score >= 30
group by store, dept, page, url
)
) t
group by date, store, dept, page;
I don't understand how your query is related to your question.

Try as I might, I don't get your output either:
But I think you can avoid UNION SELECTs - Does this do what you expect?
NULLS don't figure in COUNT DISTINCTs - and here you can combine an aggregate expression with an OLAP one ...
And Vertica has named windows to increase readability ....
WITH
input(Date,url,Store,Dept,Page,Score) AS (
SELECT DATE '2019-10-01','a','US','A','X',10
UNION ALL SELECT DATE '2019-10-01','b','US','A','X',30
UNION ALL SELECT DATE '2019-10-01','c','US','A','X',60
UNION ALL SELECT DATE '2019-10-04','a','US','A','X',20
UNION ALL SELECT DATE '2019-10-04','d','US','A','X',60
UNION ALL SELECT DATE '2019-10-06','b','US','A','X',22
UNION ALL SELECT DATE '2019-10-09','a','US','A','X',40
UNION ALL SELECT DATE '2019-10-09','e','US','A','X',10
)
SELECT
date
, store
, dept
, page
, SUM(COUNT(DISTINCT url) ) OVER(w) AS dist_urls
, SUM(COUNT(DISTINCT CASE WHEN score >=30 THEN url END)) OVER(w) AS dist_urls_gt_30
FROM input
GROUP BY
date
, store
, dept
, page
WINDOW w AS (PARTITION BY store,dept,page ORDER BY date)
;
-- out date | store | dept | page | dist_urls | dist_urls_gt_30
-- out ------------+-------+------+------+-----------+-----------------
-- out 2019-10-01 | US | A | X | 3 | 2
-- out 2019-10-04 | US | A | X | 5 | 3
-- out 2019-10-06 | US | A | X | 6 | 3
-- out 2019-10-09 | US | A | X | 8 | 4
-- out (4 rows)
-- out
-- out Time: First fetch (4 rows): 45.321 ms. All rows formatted: 45.364 ms

Combining Two Tables & Summing REV amts by Mth

Below are my two tables of data
Acct BillingDate REV
101 01/05/2018 5
101 01/30/2018 4
102 01/15/2018 2
103 01/4/2018 3
103 02/05/2018 2
106 03/06/2018 5
Acct BillingDate Lease_Rev
101 01/15/2018 2
102 01/16/2018 1
103 01/19/2018 2
104 02/05/2018 3
105 04/02/2018 1
Desired Output
Acct Jan Feb Mar Apr
101 11
102 3
103 5 2
104 3
105 1
106 5
My SQL Script is Below:
SELECT [NewSalesHistory].[Region]
,[NewSalesHistory].[Account]
,SUM(case when [NewSalesHistory].[billingdate] between '6/1/2016' and '6/30/2016' then REV else 0 end ) + [X].[Jun-16] AS 'Jun-16'
FROM [NewSalesHistory]
FULL join (SELECT [Account]
,SUM(case when [BWLease].[billingdate] between '6/1/2016' and '6/30/2016' then Lease_REV else 0 end ) as 'Jun-16'
FROM [AirgasPricing].[dbo].[BWLease]
GROUP BY [Account]) X ON [NewSalesHistory].[Account] = [X].[Account]
GROUP BY [NewSalesHistory].[Region]
,[NewSalesHistory].[Account]
,[X].[Jun-16]
I am having trouble combining these tables. If there is a rev amt and lease rev amt then it will combine (sum) for that account. If there is not a lease rev amt (which is the majority of the time), it brings back NULLs for all other rev amts accounts in Table 1. Table one can have duplicate accounts with different Rev, while the Table two is one unique account only w Lease rev. The output above is how I would like to see the data.
What am I missing here? Thanks!

I would suggest union all and group by:
select acct,
sum(case when billingdate >= '2016-01-01' and billingdate < '2016-02-01' then rev end) as rev_201601,
sum(case when billingdate >= '2016-02-01' and billingdate < '2016-03-01' then rev end) as rev_201602,
. . .
from ((select nsh.acct, nsh.billingdate, nsh.rev
from NewSalesHistory
) union all
(select bl.acct, bl.billingdate, bl.rev
from AirgasPricing..BWLease bl
)
) x
group by acct;

Okay, so there are a few things going on here:
1) As Gordon Linoff mentioned you can perform a union all on the two tables. Be sure to limit your column selections and name your columns appropriately:
select
x as consistentname1,
y as consistentname2,
z as consistentname3
from [NewSalesHistory]
union all
select
a as consistentname1,
b as consistentname2,
c as consistentname3
from [BWLease]
2) Your desired result contains a pivoted month column. Generate a column with your desired granularity on the result of the union in step one. F.ex. months:
concat(datepart(yy, Date_),'-',datename(mm,Date_)) as yyyyM
Then perform aggregation using a group by:
select sum(...) as desiredcolumnname
...
group by PK1, PK2, yyyyM
Finally, PIVOT to obtain your result: https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-2017
3) If you have other fields/columns that you wish to present then you first need to determine whether they are measures (can be aggregated) or are dimensions. That may be best addressed in a follow up question after you've achieved what you set out for in this part.
Hope it helps
As an aside, it seems like you are preparing data for reporting. Performing these transformations can be facilitated using a GUI such as MS Power Query. As long as your end goal is not data manipulation in the DB itself, you do not need to resort to raw sql.

Cumulative Compoud Interest Calculation(Oracle Database 11g Release 2)

I have a requirement to calculate rolling compound interest on several accounts in pl/sql. I was looking for help/advice on how to script calculate these calculations. The calculations I need are in the last two columns of the output below (INTERESTAMOUNT AND RUNNING TOTAL). I found similar examples of this on here, but nothing specifically fitting these requirements in pl/sql. I am also new to CTE/Recursive Techniques and the Model technique I found required a specific iteration which would be variable in this case. Please see my problem below:
Calculations:
INTERESTAMOUNT = (Previous Year RUNNING TOTAL+ Current Year AMOUNT) * INTEREST_RATE
RUNNINGTOTAL = (Previous Year RUNNING TOTAL+ Current Year AMOUNT) * (1 + INTEREST_RATE) - CURRENT YEAR EXPENSES
Input Table:
YEAR ACCT_ID AMOUNT INTEREST_RATE EXPENSES
2002 1 1000 0.05315 70
2003 1 1500 0.04213 80
2004 1 800 0.03215 75
2005 1 950 0.02563 78
2000 2 750 0.07532 79
2001 2 600 0.06251 75
2002 2 300 0.05315 70
Desired Output:
YEAR ACCT_ID AMOUNT INTEREST_RATE EXPENSES INTERESTAMOUNT RUNNINGTOTAL
2002 1 1000 0.05315 70 53.15 983.15
2003 1 1500 0.04213 80 104.62 2507.77
2004 1 800 0.03215 75 106.34 3339.11
2005 1 950 0.02563 78 109.93 4321.04
2000 2 750 0.07532 79 56.49 727.49
2001 2 600 0.06251 75 82.98 1335.47
2002 2 300 0.05315 70 86.93 1652.4

One way to do it is with a recursive cte.
with rownums as (select t.*
,row_number() over(partition by acct_id order by yr) as rn
from t) -- t is your tablename
,cte(rn,yr,acct_id,amount,interest_rate,expenses,running_total,interest_amount) as
(select rn,yr,acct_id,amount,interest_rate,expenses
,(amount*(1+interest_rate))-expenses
,amount*interest_rate
from rownums
where rn=1
union all
select t.rn,t.yr,t.acct_id,t.amount,t.interest_rate,t.expenses
,((c.running_total+t.amount)*(1+t.interest_rate))-t.expenses
,(c.running_total+t.amount)*t.interest_rate
from cte c
join rownums t on t.acct_id=c.acct_id and t.rn=c.rn+1
)
select * from cte
Sample Demo
Generate row numbers using row_number function
Calculate the interest and running total of the first row for each acct_id (anchor in the recursive cte).
Join every row to the next one (ordered by ascending order of year column) for each account_id and compute the running total and interest for the subsequent rows.

Can I use Oracle SQL to plot actual dates from Schedule Information?

I asked this question in regard to SQL Server, but what's the answer for an Oracle environment (10g)?
If I have a table containing schedule information that implies particular dates, is there a SQL statement that can be written to convert that information into actual rows, using something like MSSQL's Commom Table Expressions, perhaps?
Consider a payment schedule table with these columns:
StartDate - the date the schedule begins (1st payment is due on this date)
Term - the length in months of the schedule
Frequency - the number of months between recurrences
PaymentAmt - the payment amount :-)
SchedID StartDate Term Frequency PaymentAmt
-------------------------------------------------
1 05-Jan-2003 48 12 1000.00
2 20-Dec-2008 42 6 25.00
Is there a single SQL statement to allow me to go from the above to the following?
Running
SchedID Payment Due Expected
Num Date Total
--------------------------------------
1 1 05-Jan-2003 1000.00
1 2 05-Jan-2004 2000.00
1 3 05-Jan-2005 3000.00
1 4 05-Jan-2006 4000.00
2 1 20-Dec-2008 25.00
2 2 20-Jun-2009 50.00
2 3 20-Dec-2009 75.00
2 4 20-Jun-2010 100.00
2 5 20-Dec-2010 125.00
2 6 20-Jun-2011 150.00
2 7 20-Dec-2011 175.00
Your thoughts are appreciated.

Oracle actually has syntax for hierarchical queries using the CONNECT BY clause. SQL Server's use of the WITH clause looks like a hack in comparison:
SELECT t.SchedId,
CASE LEVEL
WHEN 1 THEN
t.StartDate
ELSE
ADD_MONTHS(t.StartDate, t.frequency)
END 'DueDate',
CASE LEVEL
WHEN 1 THEN
t.PaymentAmt
ELSE
SUM(t.paymentAmt)
END 'RunningExpectedTotal'
FROM PaymentScheduleTable t
WHERE t.PaymentNum <= t.Term / t.Frequency
CONNECT BY PRIOR t.startdate = t.startdate
GROUP BY t.schedid, t.startdate, t.frequency, t.paymentamt
ORDER BY t.SchedId, t.PaymentNum
I'm not 100% on that - I'm more confident about using:
SELECT t.SchedId,
t.StartDate 'DueDate',
t.PaymentAmt 'RunningExpectedTotal'
FROM PaymentScheduleTable t
WHERE t.PaymentNum <= t.Term / t.Frequency
CONNECT BY PRIOR t.startdate = t.startdate
ORDER BY t.SchedId, t.PaymentNum
...but it doesn't include the logic to handle when you're dealing with the 2nd+ entry in the chain to add months & sum the amounts. The summing could be done with GROUP BY CUBE or ROLLUP depending on the detail needed.

I don't understand why 5 payment days for schedid = 1 and 7 for scheid = 2?
48 /12 = 4 and 42 / 6 = 7. So I expected 4 payment days for schedid = 1.
Anyway I use the model clause:
create table PaymentScheduleTable
( schedid number(10)
, startdate date
, term number(3)
, frequency number(3)
, paymentamt number(5)
);
insert into PaymentScheduleTable
values (1,to_date('05-01-2003','dd-mm-yyyy')
, 48
, 12
, 1000);
insert into PaymentScheduleTable
values (2,to_date('20-12-2008','dd-mm-yyyy')
, 42
, 6
, 25);
commit;
And now the select with model clause:
select schedid, to_char(duedate,'dd-mm-yyyy') duedate, expected, i paymentnum
from paymentscheduletable
model
partition by (schedid)
dimension by (1 i)
measures (
startdate duedate
, paymentamt expected
, term
, frequency)
rules
( expected[for i from 1 to term[1]/frequency[1] increment 1]
= nvl(expected[cv()-1],0) + expected[1]
, duedate[for i from 1 to term[1]/frequency[1] increment 1]
= add_months(duedate[1], (cv(i)-1) * frequency[1])
)
order by schedid,i;
This outputs:
SCHEDID DUEDATE EXPECTED PAYMENTNUM
---------- ---------- ---------- ----------
1 05-01-2003 1000 1
1 05-01-2004 2000 2
1 05-01-2005 3000 3
1 05-01-2006 4000 4
2 20-12-2008 25 1
2 20-06-2009 50 2
2 20-12-2009 75 3
2 20-06-2010 100 4
2 20-12-2010 125 5
2 20-06-2011 150 6
2 20-12-2011 175 7
11 rows selected.

I didn't set out to answer my own question, but I'm doing work with Oracle now and I have had to learn some new Oracle-flavored things.
Anyway, the CONNECT BY statement is really nice--yes, much nicer than MSSQL's hierchical query approach, and using that construct, I was able to produce a very clean query that does what I was looking for:
SELECT DISTINCT
t.SchedID
,level as PaymentNum
,add_months(T.StartDate,level - 1) as DueDate
,(level * t.PaymentAmt) as RunningTotal
FROM SchedTest t
CONNECT BY level <= (t.Term / t.Frequency)
ORDER BY t.SchedID, level
My only remaining issue is that I had to use DISTINCT because I couldn't figure out how to select my rows from DUAL (the affable one-row Oracle table) instead of from my table of schedule data, which has at least 2 rows. If I could do the above with FROM DUAL, then my DISTINCT indicator wouldn't be necessary. Any thoughts?
Other than that, I think this is pretty nice. Et tu?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL script to partition data on a column and return the max value [duplicate] - sql

Related

Frequency of Address changes in number of days SQL

Vertica SQL for running count distinct and running conditional count

Combining Two Tables & Summing REV amts by Mth

Cumulative Compoud Interest Calculation(Oracle Database 11g Release 2)

Can I use Oracle SQL to plot actual dates from Schedule Information?

Categories

Resources