I have to extract all those customer names having transactions of less than 5000 each per month for 6 consecutive months and then have 3 transactions of 20,000 each on 7th month.
All the transactions for a customer will be stored in different rows.
Example: Considering customer A, Information for the customer will be stored as follows:
Name | TransactionDate | Amount
1. CustomerA | 27-08-2015 | 4500
2. CustomerA | 27-09-2015 | 4500
3. CustomerA | 27-10-2015 | 4500
4. CustomerA | 27-11-2015 | 4500
5. CustomerA | 27-12-2015 | 4500
6. CustomerA | 27-01-2016 | 4500
7. CustomerA | 27-02-2016 | 20000
8. CustomerA | 27-02-2016 | 20000
9. CustomerA | 27-02-2016 | 20000
Until you specify SQL flavor, I think I got a flexible and decent solution for T-SQL:
1) To have simpler queries, I have defined as persisted column to store month number is a convenient way:
create table CustomerTransaction
(
CustomerName VARCHAR(20),
TransactionDate DATE,
Amount NUMERIC(18, 2),
MonthNo AS DATEPART(yyyy, TransactionDate) * 12 + DATEPART(mm, TransactionDate) - 1 PERSISTED
)
If this cannot be used, you can employee date arithmetic (DATEDIFF), or have the exact computation inlined.
First CTE gets transaction data with a row number and a start month number for that group (customer and payment series).
For each category, small amounts and 20K (big amounts), I have selected from previous CTE applying filtering based on amount.
For each series apply the count criteria (6 small payments, followed by 3 big ones).
Join small and big payments based on customer and dates (group month is the smallest group month - 1).
The final query is the following:
declare #SmallAmountsLen INT = 6;
declare #BigAmountsLen INT = 3;
declare #SmallAmountThreshold NUMERIC(18, 2) = 5000
declare #BigAmount NUMERIC(18, 2) = 20000
;with AmountCte AS (
SELECT CustomerName, TransactionDate, Amount, MonthNo, ROW_NUMBER() OVER (PARTITION BY CustomerName ORDER BY TransactionDate) AS RowNo,
MonthNo - ROW_NUMBER() OVER (PARTITION BY CustomerName ORDER BY TransactionDate) AS GroupMonthNo
FROM CustomerTransaction
),
SmallAmountCte AS (
SELECT *
FROM AmountCte
WHERE Amount < #SmallAmountThreshold
),
BigAmountCte AS (
SELECT *
FROM AmountCte
WHERE Amount = #BigAmount
),
SmallGroupCte AS (
select CustomerName, GroupMonthNo
from SmallAmountCte
group by CustomerName, GroupMonthNo
having count(1) = #SmallAmountsLen
),
BigGroupCte AS (
select CustomerName, MonthNo
from BigAmountCte
group by CustomerName, MonthNo
having count(1) = #BigAmountsLen
)
select S.*, B.*
from SmallGroupCte S
join BigGroupCte B on B.CustomerName = S.CustomerName
where B.MonthNo = S.GroupMonthNo + #SmallAmountsLen + 1
[EDIT] Query without need of a computed column
declare #SmallAmountsLen INT = 6;
declare #BigAmountsLen INT = 3;
declare #SmallAmountThreshold NUMERIC(18, 2) = 5000
declare #BigAmount NUMERIC(18, 2) = 20000
;with AmountCte AS (
SELECT CustomerName, TransactionDate, Amount, DATEPART(yyyy, TransactionDate) * 12 + DATEPART(mm, TransactionDate) - 1 AS MonthNo,
ROW_NUMBER() OVER (PARTITION BY CustomerName ORDER BY TransactionDate) AS RowNo,
DATEPART(yyyy, TransactionDate) * 12 + DATEPART(mm, TransactionDate) - 1 - ROW_NUMBER() OVER (PARTITION BY CustomerName ORDER BY TransactionDate) AS GroupMonthNo
FROM CustomerTransaction
),
SmallAmountCte AS (
SELECT *
FROM AmountCte
WHERE Amount < #SmallAmountThreshold
),
BigAmountCte AS (
SELECT *
FROM AmountCte
WHERE Amount = #BigAmount
),
SmallGroupCte AS (
select CustomerName, GroupMonthNo
from SmallAmountCte
group by CustomerName, GroupMonthNo
having count(1) = #SmallAmountsLen
),
BigGroupCte AS (
select CustomerName, MonthNo
from BigAmountCte
group by CustomerName, MonthNo
having count(1) = #BigAmountsLen
)
select S.*, B.*
from SmallGroupCte S
join BigGroupCte B on B.CustomerName = S.CustomerName
where B.MonthNo = S.GroupMonthNo + #SmallAmountsLen + 1
Here is a simpler query to get the result, what I wanted:
WITH CTE
AS
(
SELECT * FROM
(
SELECT DENSE_RANK() OVER (PARTITION BY Name ORDER BY DATEPART(MONTH,TransactionDate)) AS SrNo,
Name,Amount,DatePart(Month,TransactionDate) AS MonthNo,TransactionDate FROM TransactionTable) AS A
WHERE
(Amount <= 5000 AND SrNo < 7) OR (Amount = 20000 AND SrNo = 7)
)
SELECT A.Name AS Account_Number,A.Transaction_Date FROM
(
SELECT Name,Amount,COUNT(SrNo) As Sr,MAX(TransactionDate) AS Transaction_Date FROM CTE
WHERE SrNo = 7 AND Amount = 20000
GROUP BY Name,Amount
HAVING COUNT(srNo) = 3) AS A
INNER JOIN
(
SELECT Name,COUNT(SrNo) AS Ss FROM CTE
WHERE SrNo < 7 AND Amount <= 5000
GROUP BY Name
HAVING COUNT(srNo) = 6) AS B
ON A.Name = B.Name
Related
My 'deals_payments' table is:
Due Date Payment ID
1-Mar-19 1,000.00 123
1-Apr-19 1,000.00 123
1-May-19 1,000.00 123
1-Jun-19 1,000.00 123
1-Jul-19 1,000.00 123
1-Aug-19 1,000.00 123
1-Jun-19 500.00 456
1-Jul-19 500.00 456
1-Aug-19 500.00 456
I have the SQL code:
select
count(*), payment
from (select deals_payments.*,
(row_number() over (order by due_date) -
row_number() over (partition by payment order by due_date)
) as grp
from deals_payments
where id = 123
) deals_payments
group by grp, payment
order by grp
which gives me what I want - the number of payments on each distinct amount - (here I only asked for ID 123):
COUNT(*) PAYMENT
6 1000.00
But now I need the sum of payments of the two ID's (123 and 456), where the due dates are the same, and count the number of payments on each distinct amount, as:
COUNT(*) PAYMENT
3 1000.00
3 1500.00
I tried the below but it gives me the 'missing right parenthesis' error. What is wrong??
select
count(*),
(select
sum(total) total
from (select distinct
due_date,
(select
sum(payment)
from deals_payments
where (due_date = a.due_date)) as total
from deals_payments a
where a.id in (123, 456)
and payment > 0)
group by due_date
order by due_date) b
from (select deals_payments.*,
(row_number() over (order by due_date) -
row_number() over (partition by payment order by due_date)
) as grp
from deals_payments
where id = 123
) deals_payments
group by grp, payment
order by grp
Taking your earlier comments into consideration, I agree that the SQL can be simplified to get the intended result. My understanding is that the expected output is the frequency of the total payment of a subset of IDs on any given date.
select count(*) as PaymentFrequency, TotalPaidOnDueDate from
(
select due_date, sum(payment) as TotalPaidOnDueDate from #deals_payments
where ID in (123, 456)
group by due_date
) a
group by a.TotalPaidOnDueDate
Here is a sql fiddle I used to verify: http://sqlfiddle.com/#!18/6b04f/1
This seems really strange. I don't understand why your logic is so complicated.
How about this?
select id, count(*), max(payment)
from (select dp.*,
count(*) over (partition by due_date) as cnt
from deal_payments dp
where dp.id in (123, 456)
) dp
where cnt = 2
group by id;
An interesting question. Could this do the trick???
select payment, count(*)
from deals_payments
where due_date in
(select due_date
from deals_payments
group by due_date
having count(*) > 1)
group by payment;
You can add a filter by id if you want, of course.
Situation:
I have 5 columns
id
subtotal (price of item)
order_date (purchase date)
updated_at (if refunded or any other status change)
status
Objective:
I need the order date as column 1
I need to get the subtotal for each day regardless if of the status as column 2
I need the subtotal amount for refunds for the third column.
Example:
If a purchase is made on May 1st and refunded on May 3rd. The output should look like this
+-------+----------+--------+
| date | subtotal | refund |
+-------+----------+--------+
| 05-01 | 10.00 | 0.00 |
| 05-02 | 00.00 | 0.00 |
| 05-03 | 00.00 | 10.00 |
+-------+----------+--------+
while the row will look like that
+-----+----------+------------+------------+----------+
| id | subtotal | order_date | updated_at | status |
+-----+----------+------------+------------+----------+
| 123 | 10 | 2019-05-01 | 2019-05-03 | refunded |
+-----+----------+------------+------------+----------+
Query:
Currently what I have looks like this:
Note: Timezone discrepancy therefore bring back the dates by 8 hours.
;with cte as (
select id as orderid
, CAST(dateadd(hour,-8,order_date) as date) as order_date
, CAST(dateadd(hour,-8,updated_at) as date) as updated_at
, subtotal
, status
from orders
)
select
b.dates
, sum(a.subtotal_price) as subtotal
, -- not sure how to aggregate it to get the refunds
from Orders as o
inner join cte as a on orders.id=cte.orderid
inner join (select * from cte where status = ('refund')) as b on o.id=cte.orderid
where dates between '2019-05-01' and '2019-05-31'
group by dates
And do I need to join it twice? Hopefully not since my table is huge.
This looks like a job for a Calendar Table. Bit of a stab in the dark, but:
--Overly simplistic Calendar table
CREATE TABLE dbo.Calendar (CalendarDate date);
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS I
FROM N N1, N N2, N N3, N N4, N N5) --Many years of data
INSERT INTO dbo.Calendar
SELECT DATEADD(DAY, T.I, 0)
FROM Tally T;
GO
SELECT C.CalendarDate AS [date],
CASE C.CalendarDate WHEN V.order_date THEN subtotal ELSE 0 END AS subtotal,
CASE WHEN C.CalendarDate = V.updated_at AND V.[status] = 'refunded' THEN subtotal ELSE 0.00 END AS subtotal
FROM (VALUES(123,10.00,CONVERT(date,'20190501'),CONVERT(date,'20190503'),'refunded'))V(id,subtotal,order_date,updated_at,status)
JOIN dbo.Calendar C ON V.order_date <= C.CalendarDate AND V.updated_at >= C.CalendarDate;
GO
DROP TABLE dbo.Calendar;
Consider joining on a recursive CTE of sequential dates:
WITH dates AS (
SELECT CONVERT(datetime, '2019-01-01') AS rec_date
UNION ALL
SELECT DATEADD(d, 1, CONVERT(datetime, rec_date))
FROM dates
WHERE rec_date < '2019-12-31'
),
cte AS (
SELECT id AS orderid
, CAST(dateadd(hour,-8,order_date) AS date) as order_date
, CAST(dateadd(hour,-8,updated_at) AS date) as updated_at
, subtotal
, status
FROM orders
)
SELECT rec_date AS date,
CASE
WHEN c.order_date = d.rec_date THEN subtotal
ELSE 0
END AS subtotal,
CASE
WHEN c.updated_at = d.rec_date THEN subtotal
ELSE 0
END AS refund
FROM cte c
JOIN dates d ON d.rec_date BETWEEN c.order_date AND c.updated_at
WHERE c.status = 'refund'
option (maxrecursion 0)
GO
Rextester demo
I have a table like below, I am trying to run a query in T-SQL to get the earliest and latest costs for each project_id according to the date column and calculate the percent cost increase or decrease and return the data-set show in the second table (I have simplified the table in this question).
project_id date cost
-------------------------------
123 7/1/17 5000
123 8/1/17 6000
123 9/1/17 7000
123 10/1/17 8000
123 11/1/17 9000
456 7/1/17 10000
456 8/1/17 9000
456 9/1/17 8000
876 1/1/17 8000
876 6/1/17 5000
876 8/1/17 10000
876 11/1/17 8000
Result:
(Edit: Fixed the result)
project_id "cost incr/decr pct"
------------------------------------------------
123 80% which is (9000-5000)/5000
456 -20%
876 0%
Whatever query I run I get duplicates.
This is what I tried:
select distinct
p1.Proj_ID, p1.date, p2.[cost], p3.cost,
(nullif(p2.cost, 0) / nullif(p1.cost, 0)) * 100 as 'OVER UNDER'
from
[PROJECT] p1
inner join
(select
[Proj_ID], [cost], min([date]) min_date
from
[PROJECT]
group by
[Proj_ID], [cost]) p2 on p1.Proj_ID = p2.Proj_ID
inner join
(select
[Proj_ID], [cost], max([date]) max_date
from
[PROJECT]
group by
[Proj_ID], [cost]) p3 on p1.Proj_ID = p3.Proj_ID
where
p1.date in (p2.min_date, p3.max_date)
Unfortunately, SQL Server does not have a first_value() aggregation function. It does have an analytic function, though. So, you can do:
select distinct project_id,
first_value(cost) over (partition by project_id order by date asc) as first_cost,
first_value(cost) over (partition by project_id order by date desc) as last_cost,
(first_value(cost) over (partition by project_id order by date desc) /
first_value(cost) over (partition by project_id order by date asc)
) - 1 as ratio
from project;
If cost is an integer, you may need to convert to a representation with decimal places.
You can use row_number and OUTER APPLY over top 1 ... prior to SQL 2012
select
min_.projectid,
latest_.cost - min_.cost [Calculation]
from
(select
row_number() over (partition by projectid order by date) rn
,projectid
,cost
from projectable) min_ -- get the first dates per project
outer apply (
select
top 1
cost
from projectable
where
projectid = min_.projectid -- get the latest cost for each project
order by date desc
) latest_
where min_.rn = 1
This might perform a little better
;with costs as (
select *,
ROW_NUMBER() over (PARTITION BY project_id ORDER BY date) mincost,
ROW_NUMBER() over (PARTITION BY project_id ORDER BY date desc) maxcost
from table1
)
select project_id,
min(case when mincost = 1 then cost end) as cost1,
max(case when maxcost = 1 then cost end) as cost2,
(max(case when maxcost = 1 then cost end) - min(case when mincost = 1 then cost end)) * 100 / min(case when mincost = 1 then cost end) as [OVER UNDER]
from costs a
group by project_id
I have a table including more than 5 million rows of sales transactions. I would like to find sum of date intervals between each customer three recent purchases.
Suppose my table looks like this :
CustomerID ProductID ServiceStartDate ServiceExpiryDate
A X1 2010-01-01 2010-06-01
A X2 2010-08-12 2010-12-30
B X4 2011-10-01 2012-01-15
B X3 2012-04-01 2012-06-01
B X7 2012-08-01 2013-10-01
A X5 2013-01-01 2015-06-01
The Result that I'm looking for may looks like this :
CustomerID IntervalDays
A 802
B 135
I know the query need to first retrieve 3 resent transactions of each customer (based on ServiceStartDate) and then calculate the interval between startDate and ExpiryDate of his/her transactions.
You want to calculate the difference between the previous row's ServiceExpiryDate and the current row's ServiceStartDate based on descending dates and then sum up the last two differences:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc
, ServiceExpiryDate desc -- don't know if this 2nd column is necessary
) as rn
from tab
)
select t2.customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte as t2 left join cte as t1
on t1.customerId = t2.customerId
and t1.rn = t2.rn+1 -- previous and current row
where t2.rn <= 3 -- last three rows
group by t2.customerId;
Same result using LEAD:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc) as rn
,lead(ServiceExpiryDate)
over (partition by customerId
order by ServiceStartDate desc
) as prevEnd
from tab
)
select customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte
where rn <= 3
group by customerId;
Both will not return the expected result unless you subtract purchases (or max(rn)) from Intervaldays. But as you only sum two differences this seems to be not correct for me either...
Additional logic must be applied based on your rules regarding:
customer has less than 3 purchases
overlapping intervals
Assuming there are no overlaps, I think you want this:
select customerId,
sum(datediff(day, ServiceStartDate, ServieEndDate) as Intervaldays
from (select t.*, row_number() over (partition by customerId
order by ServiceStartDate desc) as seqnum
from table t
) t
where seqnum <= 3
group by customerId;
Try this:
SELECT dt.CustomerID,
SUM(DATEDIFF(DAY, dt.PrevExpiry, dt.ServiceStartDate)) As IntervalDays
FROM (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY ServiceStartDate DESC) AS rn
, (SELECT Max(ti.ServiceExpiryDate)
FROM yourTable ti
WHERE t.CustomerID = ti.CustomerID
AND ti.ServiceStartDate < t.ServiceStartDate) As PrevExpiry
FROM yourTable t )dt
GROUP BY dt.CustomerID
Result will be:
CustomerId | IntervalDays
-----------+--------------
A | 805
B | 138
I have these two tables:
Expense table
id expense expense_date product_id
1 10 2012-01-03 1
2 10 2014-02-01 2
3 10 2014-02-03 1
4 10 2012-07-03 1
Product table
product_id product_name purchase_date
1 car 2010-02-01
2 bike 2014-03-01
I would like to achieve the result to something like this by summing the expenses grouping by product_id where the the expense_date is between the purchase_date to it's next year:
Year Expense Product_name
1 0 car 2010-02-01 to 2011-02-01
2 10 car 2011-02-01 to 2012-02-01
3 10 car 2012-02-01 to 2013-02-01
4 0 car 2013-02-01 to 2014-02-01
5 10 car 2014-02-01 to 2015-02-01
1 10 bike 2014-03-01 to 2015-03-01
CREATE VIEW dbo.vwMaxExpenseDate
AS
SELECT product_id, MAX(expense_date) AS 'max_expense_date'
FROM Expense
GROUP BY product_id
DECLARE #PossibleYearRange TABLE
(
product_id INT,
YearStart DATETIME,
YearEnd DATETIME
);
WITH CTE
AS
(
SELECT p.product_id, max_expense_date, purchase_date, 1 As Number
FROM Product p
LEFT JOIN vwMaxExpenseDate e
ON p.product_id = e.product_id
UNION ALL
SELECT product_id, max_expense_date, purchase_date, Number + 1
FROM CTE
WHERE Number <= (YEAR(max_expense_date) - YEAR(purchase_date))
)
INSERT INTO #PossibleYearRange
(
product_id,
YearStart,
YearEnd
)
SELECT product_id,
CONVERT(DATETIME, CONVERT(VARCHAR(20), YEAR(purchase_date) + Number - 1) + '-'
+ CONVERT(VARCHAR(20), MONTH(purchase_date)) + '-'
+ CONVERT(VARCHAR(20), DAY(purchase_date))) AS 'YearStart',
CONVERT(DATETIME, CONVERT(VARCHAR(20), YEAR(purchase_date) + Number) + '-'
+ CONVERT(VARCHAR(20), MONTH(purchase_date)) + '-'
+ CONVERT(VARCHAR(20), DAY(purchase_date))) AS 'YearEnd'
FROM CTE
ORDER BY product_id ASC, NUMBER ASC
SELECT MAX(p.product_id) AS product_id, MAX(product_name) AS product_name, YearStart, YearEnd, COALESCE(SUM(expense), 0) AS TotalExpensePerYear
FROM #PossibleYearRange p
LEFT JOIN Expense e
ON p.product_id = e.product_id AND
expense_date BETWEEN YearStart AND YearEnd
INNER JOIN Product d
ON p.product_id = d.product_id
GROUP BY YearStart, YearEnd
ORDER BY MAX(p.product_id) ASC
Hope this helps! Thanks.
declare #BeginsAt as datetime
declare #numOf as int
set #BeginsAt = (select min(purchase_date) from Products)
set #BeginsAt = dateadd(year,datediff(year,0,#BeginsAt),0) -- force to 1st of Year
set #numOf = (year(getdate()) - year(#BeginsAt))+1
;with YearRange (id, StartAt, StopAt)
as (
select 1 as id, #BeginsAt, dateadd(Year,1,#BeginsAt)
union all
select (id + 1) , dateadd(Year,1,StartAt) , dateadd(Year,1,StopAt)
from YearRange
where (id + 1) <= #numOf
)
select
y.id
, coalesce(e.expense,0) expense
, p.product_name
, y.startAt
, dateadd(day,-1,y.StopAt)
from YearRange Y
left join Products P on Y.StopAt between P.purchase_date AND (select max(StopAt) from YearRange)
left join Expenses E on E.expense_date >= Y.StartAt and E.expense_date < Y.StopAt
and E.product_id = P.product_id
See this SQLFiddle demo