Get the ranges from the table on the basis of value - sql

I have some specific transaction count value and I have to check if that count lies within what range. The ranges are specified in the below table. For example, if the count value is >= 1 AND less than 250001 from the range_start column then the count lies with range_id 1.
The tricky part is that if the transaction count on the first day of the month is greater than 1 and less than 31 and lies in range_id 3 then I have to divide the count into 3 bands for example if I have 30 transactions on the first day of the month then I will calculate the fees for the 10 transactions on the basis of range_id 1, and for the other 10 the fees would be calculated on the basis of range_id 2 and the remaining 10 transactions would be calculated on the range_id 3. Now on the second day of the month fees calculation would start from band 3 and it will keep moving to the next bands as the transaction volume would increase.
More Exaplanation:
Fees calculation is like this:
Total Transactions on first day of the month are:30
Auth Fees for First 10 transaction would be 10 * 0.1698 (range_id 1) = 1.698
Auth Fees the other 10 transactions would be 10 * 0.1536 (range_id 2)= 1.536
Auth Fees the other 10 transactions would be 10 * 0.1403 (range_id 3)= 1.403
Total Transactions on the second day of the month are : 20
Auth Fees for first 10 transactions would be 10 * 1.403 (range_id 4) = 14.03
Auth Fees for second 10 transactions would be 10 * 0.1036 (range_id 5) = 1.036
Total Transactions on the third day of the month are : 5
Auth Fees for 5 transactions would be 5 * 1.036 (range_id 6) = 5.18
I am saving the daily transaction count in a table, for example for the first day it would be 30,
for the second day it would be 50 and for the third day it would be 55 and on the first day of the month it would be reset to 0.
The fees calculation is daily and the transaction table will only have data for one day and at the end of the calculation, the transaction table is dropped, So I am keeping the total count for the previous business day in the table(gstl_daily_volume) as well in order to calculate the range_id next day.
I have achieved the band calculation with the below queries but the problem with this is that it can only calculate the fees accurately for one day and for the second day it starts again from the range_id 1 as it is not considering the volume of the previous day from gstl_daily_volume table. Please help me understand how can I continue on the second day from where I left off on first day of the month while considering the volume of the previous day.
declare #rangeTable TABLE
(
rangeId INT,
rangeStart INT,
rangeEnd INT NULL,
authFees decimal(8,4),
settlementFees decimal(8,4),
declinedFees decimal(8,4)
)
insert into #rangeTable
values
(1, 1, 11, 0.1698, 0.1359, 0.3284),
(2, 11, 21, 0.1536, 0.1536, 0.3280),
(3, 21, 31, 0.1403, 0.1330, 0.3278),
(4, 31, 41, 0.1203, 0.1320, 0.3276),
(5, 41, 51, 0.1036, 0.1310, 0.3274),
(6, 51, NULL, 0.0873, 0.1300, 0.3272)
declare #transactionsTable TABLE
(
transactionId INT IDENTITY(1,1),
transactionDate DateTime,
transactionAmount decimal(8,2)
)
insert into #transactionsTable
values
(N'2020-12-01', 500),
(N'2020-12-01', 501)
--- calculate per date total transaction fees
select
Datee = CAST(C.transactionDate AS DATE),
TotalSettlememtFee = SUM(C.settlementFees)
From
(
select
*
-- each transaction counts as 1
--,AuthFee = 1 * B.authFees
--,SettlememtFee = 1 * settlementFees
from
(
-- set a row number or transaction number per transaction which resets every month
select
rowNumber = ROW_NUMBER() OVER(PARTITION BY DATEPART(MONTH, transactionDate), DATEPART(YEAR, transactionDate) ORDER BY transactionDate),
*
from #transactionsTable tt
) A
-- cross apply to calculate which range for each transaction
CROSS APPLY
(
select
*
From #rangeTable rtt
where A.rowNumber >= rtt.rangeStart
AND A.rowNumber < rtt.rangeEnd
) B
) C
-- group by date to get the per date fees
group by CAST(C.transactionDate AS DATE)
Daily Volume table where The total count would be saved for each day:
GO
/****** Object: Table [dbo].[gstl_daily_volume] Script Date: 1/4/2021 5:50:08 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[gstl_daily_volume](
[business_date] [datetime] NULL,
[record_type] [varchar](50) NULL,
[total_count] [varchar](50) NULL,
[band_id] [bigint] NULL
) ON [PRIMARY]
GO
Please help, thanks in advance.

As the transactions table is a staging table, i.e. it only has data for current day - and gets empty end of the day.
I think if you add the previous days transaction count in your calculated row number this will work as expected.
DECLARE #previousDayTransactionCount INT;
-- TODO: set this to MAX from gstl_daily_volume for current month
--- calculate per date total transaction fees
select
Datee = CAST(C.transactionDate AS DATE),
TotalSettlememtFee = SUM(C.settlementFees)
From
(
select
*
-- each transaction counts as 1
--,AuthFee = 1 * B.authFees
--,SettlememtFee = 1 * settlementFees
from
(
-- set a row number or transaction number per transaction which resets every month
-- Then Add it here to current calculated rowNumber
select
rowNumber = #previousDayTransactionCount + (ROW_NUMBER() OVER(PARTITION BY DATEPART(MONTH, transactionDate), DATEPART(YEAR, transactionDate) ORDER BY transactionDate)),
*
from #transactionsTable tt
) A
-- cross apply to calculate which range for each transaction
CROSS APPLY
(
select
*
From #rangeTable rtt
where A.rowNumber >= rtt.rangeStart
AND A.rowNumber < rtt.rangeEnd
) B
) C
-- group by date to get the per date fees
group by CAST(C.transactionDate AS DATE)

Related

Sum over N days excluding Weekends and Holidays

I have below table
AccountID
Date
Amount
123
07/02/2021
2000
123
07/09/2021
9000
123
07/15/2021
500
123
07/20/2021
500
123
07/28/2021
500
I am trying to create a test script to test data for just one month(July). I want to sum the amount over 5 days where 5 days does not count weekends and holidays. Since it is month of July the holiday falls on July 5th 2021(07/05/2021).
The output should look something like below
AccountID
Date
Amount
123
07/02/2021
11000
123
07/09/2021
9500
123
07/15/2021
1000
123
07/20/2021
500
123
07/28/2021
500
Below is the table create and data insert statements for reference :-
create table TRANSACTIONS (
AccountID int,
Date date,
Amount int
)
insert into TRANSACTIONS values (123, '07/02/2021', 2000)
insert into TRANSACTIONS values (123, '07/09/2021', 9000)
insert into TRANSACTIONS values (123, '07/15/2021', 500)
insert into TRANSACTIONS values (123, '07/20/2021', 500)
insert into TRANSACTIONS values (123, '07/28/2021', 500)
I was able to create script that could sum over 5 days with skipping weekends(Saturday and Sunday). I am not able to think how can I skip the holiday on July 5th, 2021. I am fine with hardcoding it since this is just for testing purposes. The code 'DATEPART(WEEKDAY, h2.Date) not in (1, 7)' skips Weekend and 'DATEADD(d, 6, h1.Date)' here I am adding 6 and not 5 even the sum should be for over 5 days because after reading some articles I figured that in skipping weekends the last day is not inclusive so used 6 instead of 5. This code adds perfectly over 5 days skipping weekends
SELECT AccountId, Date,
(
SELECT SUM(Amount)
FROM TRANSACTIONS h2
WHERE
h1.AccountID = h2.AccountID and
DATEPART(WEEKDAY, h2.Date) not in (1, 7) and
h2.Date between h1.Date AND DATEADD(d, 6, h1.Date)
) as SumAmount
FROM TRANSACTIONS h1
The only sane way to tackle this is to have a calendar table to represent holidays. The easiest approach is to store every date for the date range you're likely to need (eg 1970-2030) with the type of the date, perhaps and enum of WORKDAY, WEEKEND, HOLIDAY or whatever works, eg
CREATE TABLE CALENDAR (
Date DATE,
Day_type varchar(16)
);
-- insert rows for dates you care about
Depending on where you live, you may need to include a region column too (typically the country and/or state).
With such a table, you join to it:
SELECT
AccountId,
DATEADD(DAY, (DATEDIFF(DAY, 0, t.Date)/7)*7 + 7, 0) as Date,
SUM(Amount)
FROM TRANSACTIONS t
JOIN CALENDAR c on t.Date = c.Date
AND c.day_type = 'WORKDAY'
WHERE t.Date BETWEEN <your date range>
GROUP BY AccountId, DATEADD(DAY, (DATEDIFF(DAY, 0, t.Date)/7)*7 + 7, 0)

SQL Selecting records sorted by growth over time

I have a simple table that contains a daily summary of the sales volumes of a couple hundred thousand products. One row for each product and date, with whatever quantity was sold that day. Table format is:
CREATE TABLE DAILYSALES (ID numeric IDENTITY PRIMARY KEY, ProductID numeric NOT NULL, XDate Date NOT NULL, QTY_SOLD int NOT NULL)
A record will only be in the table if there were sales that day, so there are no records where QTY_SOLD is zero.
I need to figure out a way to query this data within a date range, say, the last 30 days, but sorted by a growth trend (products that showed the most growth over the period would be on top).
The difference in quantities sold is off the charts... some products sell 1,000+ units per day consistently, while others sell 1 or 2 or zero on an average day and just have a couple of spikes here and there.
In an ideal result set, a product that sold 10 units a day on the first of the month, and grew by one unit a day to 40 units per day at the end of the month would rank higher than a product that sold 1,000 units a day on average and grew to 2,000 by the end of the month (a 4X growth level vs 2X).
The trouble I keep running into is that products with little to no sales but a couple of big spikes near the end always end up on top. A product that goes from 1 sale at the start of the month, nothing all month, and then 20 sales on the last day would show up first with the above model -- that shouldn't outrank a product with steadier sales.
I'm not sure what the best way to write this query would be. I imagine some kind of subquery that factors in the number of records (ie; number of days with data) that exist in the result set should be a factor, but I'm not sure where to begin. Would appreciate any suggestions, in particular from those who work with large data sets and have had to do something similar.
I would suggest to try some linear regression for this task. First identify the slope per article, then sort by slope descending. This way you should be able to identify the artricle qith the best growth. In the following example I have one article without sales, one with a constant size and one which starts at 0 and then growth in the following month:
DECLARE #t TABLE(
ArticleId int,
SoldDate date,
SoldQty int
)
;WITH cteDat AS(
SELECT CAST('2020-01-01' AS DATE) AS Dat
UNION ALL
SELECT DATEADD(d, 1, Dat)
FROM cteDat
WHERE Dat < '2020-12-31'
)
INSERT INTO #t
SELECT 123 AS ArticleId, Dat AS SoldDate, 0 AS SoldQty
FROM cteDat
UNION ALL
SELECT 456 AS ArticleId, Dat AS SoldDate, 100 AS SoldQty
FROM cteDat
UNION ALL
SELECT 789 AS ArticleId, Dat AS SoldDate, 0 AS SoldQty
FROM cteDat
OPTION (MAXRECURSION 0)
UPDATE #t
SET SoldQty = 50
WHERE ArticleId = 789
AND MONTH(SoldDate) > 7
;WITH cteRaw AS(
SELECT CAST(ArticleId AS FLOAT) AS ArticleId, CAST(CONVERT(NVARCHAR(8), SoldDate, 112) AS FLOAT) DatSID, CAST(SoldQty AS FLOAT) AS SoldQty
FROM #t
),
cteLinRegBase AS(
SELECT ArticleId
,COUNT(*) AS SampleSize
,SUM(DatSID) AS SumX
,SUM(SoldQty) AS SumY
,SUM(DatSID*DatSID) AS SumXX
,SUM(SoldQty*SoldQty) AS SumYY
,SUM(DatSID*SoldQty) AS SumXY
FROM cteRaw
GROUP BY ArticleId
)
SELECT ArticleId, CASE
WHEN SampleSize = 1 THEN 0 -- avoid divide by zero error
ELSE ( SampleSize * sumXY - sumX * sumY ) / ( SampleSize * sumXX - Power(sumX, 2) )
END
FROM cteLinRegBase
However, instead of calculating with the date as number, you could also add a rownumber or whatever to represent the X axis.

How can I query for overlapping date ranges?

I'm using SQL Server 2008 R2 and trying to create a query that will show whether dates overlap.
I'm trying to calculate the number of days someone is covered under a certain criteria. Here is an example of the table...
CREATE TABLE mytable
(
CARDNBR varchar(10)
GPI char(14) ,
GPI_DESCRIPTION_10 varchar(50) ,
RX_DATE datetime ,
DAYS_SUPPLY int ,
END_DT datetime ,
METRIC_QUANTITY float
)
INSERT INTO mytable VALUES ('1234567890','27200040000315','Glyburide','01/30/2013','30','03/01/2013','60')
INSERT INTO mytable VALUES ('1234567890','27200040000315','Glyburide','03/04/2013','30','04/03/2013','60')
INSERT INTO mytable VALUES ('1234567890','27250050007520','Metformin','01/03/2013','30','02/02/2013','120')
INSERT INTO mytable VALUES ('1234567890','27250050007520','Metformin','02/27/2013','30','03/29/2013','120')
I want to be able to count the number of days that a person was covered from the first RX_DATE to the last END_DT, which in this example is 90 days (4/3/13 - 1/3/13).
That part is done, but this is where I'm getting into trouble.
Between row 1 and row 2, there was a 3 day period where there were no drugs being taken. Between rows 3 and 4 there was a 25 day period. However, during that 25 day period, row 1 covered that gap. So the end number I need to show is 3 for the gap between rows 1 and 2.
Any help would be greatly appreciated.
Thanks.
There might be a better approach, but you could create a lookup of days, join to it and select the distinct days that join, that will get you the total count of days covered for all lines:
CREATE TABLE #lkp_Calendar (Dt DATE)
GO
SET NOCOUNT ON
DECLARE #intFlag INT
SET #intFlag = 1
WHILE (#intFlag <=500)
BEGIN
--Loop through this:
INSERT INTO #lkp_Calendar
SELECT DATEADD(day,#intFlag,'20120101')
SET #intFlag = #intFlag + 1
END
GO
--Days Covered
SELECT CARDNBR, COUNT(DISTINCT b.Dt)CT
FROM #mytable a
JOIN #lkp_Calendar b
ON b.Dt BETWEEN a.RX_DATE AND a.END_DT
GROUP BY CARDNBR
--Total Days
SELECT CARDNBR, DATEDIFF(DAY,MIN(RX_DATE),MAX(END_DT))+1 'Total_Days'
FROM #mytable
GROUP BY CARDNBR
--Combined
SELECT covered.CARDNBR, covered.CT 'Days Covered', total.Total_Days 'Total Days', total.Total_Days - covered.CT 'Days Gap'
FROM (SELECT CARDNBR, COUNT(DISTINCT b.Dt)CT
FROM #mytable a
JOIN #lkp_Calendar b
ON b.Dt BETWEEN a.RX_DATE AND a.END_DT
GROUP BY CARDNBR
)covered
JOIN (SELECT CARDNBR, DATEDIFF(DAY,MIN(RX_DATE),MAX(END_DT))+1 'Total_Days'
FROM #mytable
GROUP BY CARDNBR
)total
ON covered.CARDNBR = total.CARDNBR
You said 90 days, but I believe you should have 91. Date diff from Mon-Wed is only 2, but that's 3 days covered. But you can decide if coverage begins on the rx date or the day after.

group data by any range of 30 days (not by range of dates) in SQL Server

I got a table with a list of transactions.
for the example, lets say it has 4 fields:
ID, UserID, DateAddedd, Amount
I would like to run a query that checks if there was a time, that in 30 days, a user made transactions in the sum of 100 or more
I saw lots of samples of grouping by month or a day but the problem is that if for example
a user made a 50$ transaction on the 20/4 and on the 5/5 he made another 50$ transaction, the query should show it. (its 100$ or more in a period of 30 days)
I think that this should work (I'm assuming that transactions have a date component, and that a user can have multiple transactions on a single day):
;with DailyTransactions as (
select UserID,DATEADD(day,DATEDIFF(day,0,DateAdded),0) as DateOnly,SUM(Amount) as Amount
from Transactions group by UserID,DATEADD(day,DATEDIFF(day,0,DateAdded),0)
), Numbers as (
select ROW_NUMBER() OVER (ORDER BY object_id) as n from sys.objects
), DayRange as (
select n from Numbers where n between 1 and 29
)
select
dt.UserID,dt.DateOnly as StartDate,MAX(ot.DateOnly) as EndDate, dt.Amount + COALESCE(SUM(ot.Amount),0) as TotalSpend
from
DailyTransactions dt
cross join
DayRange dr
left join
DailyTransactions ot
on
dt.UserID = ot.UserID and
DATEADD(day,dr.n,dt.DateOnly) = ot.DateOnly
group by dt.UserID,dt.DateOnly,dt.Amount
having dt.Amount + COALESCE(SUM(ot.Amount),0) >= 100.00
Okay, I'm using 3 common table expressions. The first (DailyTransactions) is reducing the transactions table to a single transaction per user per day (this isn't necessary if the DateAdded is a date only, and each user has a single transaction per day). The second and third (Numbers and DayRange) are a bit of a cheat - I wanted to have the numbers 1-29 available to me (for use in a DATEADD). There are a variety of ways of creating either a permanent or (as in this case) temporary Numbers table. I just picked one, and then in DayRange, I filter it down to the numbers I need.
Now that we have those available to us, we write the main query. We're querying for rows from the DailyTransactions table, but we want to find later rows in the same table that are within 30 days. That's what the left join to DailyTransactions is doing. It's finding those later rows, of which there may be 0, 1 or more. If it's more than one, we want to add all of those values together, so that's why we need to do a further bit of grouping at this stage. Finally, we can write our having clause, to filter down only to those results where the Amount from a particular day (dt.Amount) + the sum of amounts from later days (SUM(ot.Amount)) meets the criteria you set out.
I based this on a table defined thus:
create table Transactions (
UserID int not null,
DateAdded datetime not null,
Amount decimal (38,2)
)
If I understand you correctly, you need a calendar table and then check the sum between date and date+30. So if you want to check a period of 1 year you need to check something like 365 periods.
Here is one way of doing that. The recursive CTE creates the calendar and the cross apply calculates the sum for each CalDate between CalDate and CalDate+30.
declare #T table(ID int, UserID int, DateAdded datetime, Amount money)
insert into #T values(1, 1, getdate(), 50)
insert into #T values(2, 1, getdate()-29, 60)
insert into #T values(4, 2, getdate(), 40)
insert into #T values(5, 2, getdate()-29, 50)
insert into #T values(7, 3, getdate(), 70)
insert into #T values(8, 3, getdate()-30, 80)
insert into #T values(9, 4, getdate()+50, 50)
insert into #T values(10,4, getdate()+51, 50)
declare #FromDate datetime
declare #ToDate datetime
select
#FromDate = min(dateadd(d, datediff(d, 0, DateAdded), 0)),
#ToDate = max(dateadd(d, datediff(d, 0, DateAdded), 0))
from #T
;with cal as
(
select #FromDate as CalDate
union all
select CalDate + 1
from cal
where CalDate < #ToDate
)
select S.UserID
from cal as C
cross apply
(select
T.UserID,
sum(Amount) as Amount
from #T as T
where T.DateAdded between CalDate and CalDate + 30
group by T.UserID) as S
where S.Amount >= 100
group by S.UserID
option (maxrecursion 0)

SQL how to make one query out of multiple ones

I have a table that holds monthly data of billing records. so say Customer 1234 was billed in Jan/Feb and Customer 2345 was billing Jan/Feb/Mar. How can I group these to show me a concurrent monthly billing cycle. But also need to have non-concurrent billed months, so Customer 3456 was billed Feb/Apl/Jun/Aug
SELECT custName, month, billed, count(*) as Tally
FROM db_name
WHERE
GROUP BY
Results needed:
Customer 1234 was billed for 2 months Concurrent
Customer 2345 was billed for 3 months Concurrent
Customer 3456 was billed for 4 months Non-Concurrent
Any suggestions?
If the month is stored as a datetime field, you can use DATEDIFF to calculate the number of months between the first and the last bill. If the number of elapsed months equals the total number of bills, the bills are consecutive.
select
'Customer ' + custname + ' was billed for ' +
cast(count(*) as varchar) + ' months ' +
case
when datediff(month,min(billdate),max(billdate))+1 = count(*)
then 'Concurrent'
else 'Non-Concurrent'
end
from #billing
where billed = 1
group by custname
If you store the billing month as an integer, you can just subtract instead of using DATEDIFF. Replace the WHEN row with:
when max(billdate)-min(billdate)+1 = count(*)
But in that case I wonder how you distinguish between years.
If the months were all in a sequence, and we are limiting our search to a particular year then Min(month) + Count(times billed) - 1 should = Max(month).
declare #billing table(Custname varchar(10), month int, billed bit)
insert into #billing values (1234, 1, 1)
insert into #billing values (1234, 2, 1)
insert into #billing values (2345, 3, 1)
insert into #billing values (2345, 4, 1)
insert into #billing values (2345, 5, 1)
insert into #billing values (3456, 1, 1)
insert into #billing values (3456, 3, 1)
insert into #billing values (3456, 9, 1)
insert into #billing values (3456, 10, 1)
Select CustName, Count(1) as MonthsBilled,
Case
when Min(Month) + Count(1) - 1 = Max(Month)
then 1
else 0
end Concurrent
From #billing
where Billed = 1
Group by CustName
Cust Months Concurrent
1234 2 1
2345 3 1
3456 4 0
The suggestions here work based on an assumption that you will never bill a customer twice or more in the same month. If that isn't a safe assumption, you need a different approach. Let us know if that's the case.
how about:
SELECT custName, month, count(*) as tally
from billing
where billed = 1
group by custName, month
You left out some important information (like how Month is stored) and what database you're using, but here's a logical approach that you can start with:
CREATE VIEW CustomerBilledInMonth (CustName, Month, AmountBilled, ContinuousFlag) AS
SELECT CustName, Month, SUM(AmountBilled), 'Noncontinuous'
FROM BillingTable BT1
WHERE NOT EXISTS
(SELECT * FROM BillingTable BT2 WHERE BT2.CustName = BT1.CustName AND BT2.Month = BT1.Month - 1)
GROUP BY CustName, Month
UNION
SELECT CustName, Month, SUM(AmountBilled), 'Continuous'
FROM BillingTable BT1
WHERE EXISTS
(SELECT * FROM BillingTable BT2 WHERE BT2.CustName = BT1.CustName AND BT2.Month = BT1.Month - 1)
GROUP BY CustName, Month
Assuming that Month here is a consecutive integer field incremented by one from the first possible month in the system, this gives you with each customer's billing for each month summed up, and an additional flag containing 'Continuous' for those months that followed a month in which the customer was also billed and 'Noncontinuous' for those months that followed a month in which the customer was not billed.
Then:
SELECT CustName, LISTOF(Month), SUM(AmountBilled), MAX(ContinuousFlag)
FROM CustomerBilledInMonth GROUP BY CustName
will give you more or less what you want (where LISTOF is some kind of COALESCE type function dependent on the exact database you're using).