Imagine a large table that contains receipt information. Since it holds so much data, you are required to return a subset of the data, excluding or consolidating rows where possible.
Here is the SQL and results table showing how the data should be returned.
create table table1
(RecieptNo smallint, Customer varchar(10), ReceiptDate date,
ItemDesc varchar(10), Amount smallint)
insert into table1 values
(100, 'Matt','2022-01-05','Ball', 10),
(101, 'Mark','2022-01-07','Hat', 20),
(101, 'Mark','2022-01-07','Jumper', -20),
(101, 'Mark','2022-01-14','Spoon', 30),
(102, 'Luke','2022-01-15','Fork', 15),
(102, 'Luke','2022-01-17','Spork', -10),
(103, 'John','2022-01-20','Orange', 13),
(103, 'John','2022-01-25','Pear', 12)
If there are rows on the same receipt where the negative and positive values cancel out, do not return either row.
If there is a receipt with a negative amount not exceeding positive amount, the negative amount should be deducted from positive line.
RecieptNo
Customer
ReceiptDate
ItemDesc
Amount
100
Matt
2022-01-05
Ball
10
101
Mark
2022-01-14
Spoon
30
102
Luke
2022-01-15
Fork
5
103
John
2022-01-20
Orange
13
103
John
2022-01-25
Pear
12
This is proving tricky, any ideas?
Based on table you provided, I suppose you want only row with the earliest date when you have multiple rows with same receipts which bring positive Amount after deduction.
;WITH cte AS (
SELECT *
, SUM( amount) OVER (PARTITION BY RecieptNo ORDER BY RecieptNo, ReceiptDate ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS ActualAmount
, ROW_NUMBER() OVER (PARTITION BY RecieptNo ORDER BY RecieptNo, ReceiptDate) AS rn
FROM table1)
SELECT RecieptNo, Customer, ReceiptDate, ItemDesc, ActualAmount
FROM cte
WHERE ActualAmount > 0 AND rn = 1
Read about window functions and cte's though.
Related
Forgive me if I word this poorly.
And sorry if it has already been asked, but I was not able to find an answer here.
I'm using Snowflake to try and do the below.
Basically, I'm trying to do a piece of work to find out how many times a customer as placed an order after a specific date for each customer.
Scenario:
We want to see if customers continue to shop with us after they have been short-shipped (received 1 or more items less than they ordered).
So for example:
customer 1 places an order on 01/01/2020 and this was a short-shipment.
they then go on to place an order 06/06/2020 and 02/02/2021.
so this customer has a total of 2 additional orders since they were short-shipped on 01/01/2020.\
customer 2 places an order on 02/03/2020 and this was short-shipped.
customer 2 has not since placed an order, so they will have 0 additional orders.
Data available:
cust_id
ord_id
order_date
1
0123
01/01/2020
1
0456
06/06/2020
1
0789
02/02/2021
2
1011
01/01/2020
Desired output:
cust_id
number_of_orders
1
2
2
0
So using a boosted version of your data:
with data_cte( cust_id, ord_id, order_date, short_order_flg) as (
select * from values
(1, '1', '2018-06-06'::date, false),
(1, '2', '2019-01-01'::date, true),
(1, '3', '2019-06-06'::date, false),
(1, '4', '2019-12-02'::date, false),
(1, '5', '2020-01-01'::date, true),
(1, '6', '2020-06-06'::date, false),
(1, '7', '2021-02-02'::date, false),
(2, '8', '2020-01-01'::date, true)
)
which shows a "valid" purchase, multiple "short ships" and how to batch them
SELECT
cust_id,
min(order_date) as short_date,
count(*) -1 as follow_count
FROM (
select
cust_id
,order_date
,CONDITIONAL_TRUE_EVENT(short_order_flg) over(partition by cust_id order by order_date ) as edge
from data_cte
)
where edge > 0
group by 1, edge
order by 1,2;
gives:
CUST_ID
SHORT_DATE
FOLLOW_COUNT
1
2019-01-01
2
1
2020-01-01
2
2
2020-01-01
0
The key things to note, CONDITIONAL_TRUE_EVENT increases each time the event happen, which gives cust_id,edge value as batch key, and if the event has not happened those lines are zero, thus the WHERE filter.
The last things is given we have atleast one count for the start of "post short" batch, we need to subtract one from the count.
Try this
with CTE as (
select 1 as cust_id, '0123' as ord_id, '2020-01-01'::date as order_date, 1 as short_order_flg union all
select 1 as cust_id, '0456' as ord_id, '2020-06-06'::date as order_date, 0 as short_order_flg union all
select 1 as cust_id, '0789' as ord_id, '2021-02-02'::date as order_date, 0 as short_order_flg union all
select 2 as cust_id, '1011' as ord_id, '2020-01-01'::date as order_date, 1 as short_order_flg
),
following_orders as (
select cust_id, short_order_flg, count(ord_id) over (partition by cust_id order by order_date rows between current row and unbounded following) - 1 as number_of_orders
from cte
order by cust_id, order_date
)
select cust_id, number_of_orders
from following_orders
where short_order_flg = 1
;
I added column short_order_flg to indicate which record represents the short order. Then I used window function count(ord_id) over(...) to calculate the number of orders following each order, subtracting 1 to exclude the current record itself. Finally, I applied a filter to select only the short order records.
Below is the table I have created and inserted values in it:
CREATE TABLE Invoices
(
InvID int,
InvAmount int
)
GO
INSERT INTO Invoices
VALUES (1, 543), (2, 749)
CREATE TABLE payments
(
PayID int IDENTITY (1, 1),
InvID int,
PayAmount int,
PayDate date
)
INSERT INTO payments
VALUES (1, 20, '2016-01-01'),
(1, 35, '2016-01-07'),
(1, 78, '2016-01-13'),
(1, 52, '2016-01-25'),
(2, 40, '2016-01-03'),
(2, 54, '2016-01-15'),
(2, 63, '2016-01-17'),
(2, 59, '2016-01-28')
SELECT * FROM Invoices
SELECT * FROM payments
As shown in the screenshot above, the Invoice table specifies various customer billings (the first billing totals 543, the second billing totals 749).
As shown in the screenshot above, the payments table specifies the various payments the customer made for each of the billings. For example, one can see that on January 1st the customer paid 20 USD out of billing no. 1 (which totals 543 USD), and on January 3rd the customer paid 40 USD out of billing no. 2, (which totals 749 USD).
Now the question is:
Write a query that displays the billing balance, based on the number of payments made so far.
The query result should exactly look like the screenshot below:
This is what I have tried:
SELECT
payments.InvID,
InvAmount - SUM(PayAmount) OVER (PARTITION BY payments.InvID ORDER BY PayID
ROWS BETWEEN UNBOUNDED PRECEDING AND 0 FOLLOWING) AS 'InvAmount',
PayDate, PayAmount,
InvAmount - SUM(PayAmount) OVER (PARTITION BY payments.InvID ORDER BY PayID) AS 'Balance'
FROM
Invoices
JOIN
payments ON payments.InvID = Invoices.InvID
After running the query, I got the following result which is shown below:
As you can see from the screenshot above, I nearly got the result I wanted.
The only problem is that InvAmount is exactly returning the same row values as Balance. I am not able to retain the starting row values of InvAmount which are 543 (InvID = 1) and 749 (InvID = 2) respectively.
How can this issue be solved?
You can add back the PayAmount in the calculation
InvAmount
+ PayAmount
- SUM(PayAmount) OVER (PARTITION BY payments.InvID
ORDER BY PayID
ROWS BETWEEN UNBOUNDED PRECEDING
AND 0 FOLLOWING) AS InvAmount
Or use BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING. But you need to handle NULL value for the very first row
InvAmount
- ISNULL(SUM(PayAmount) OVER (PARTITION BY payments.InvID
ORDER BY PayID
ROWS BETWEEN UNBOUNDED PRECEDING
AND 1 PRECEDING), 0) AS InvAmount
db>fiddle demo
You can use PARTITION BY clause for invoice identifier and then deduct the pay amount from the invoice amount as given below
;WITH CTE_Balance as
(
SELECT i.InvID, i.InvAmount, p.PayAmount, p.PayDate
, InvAmount - sum(payamount) over (partition by p.invid order by paydate rows between unbounded preceding and current row) as balance
, ROW_NUMBER() over(partition by p.invid order by p.paydate) as rnk
FROM payments as p
inner join Invoices as i
on i.InvID = p.InvID
)
SELECT invid, case when rnk =1 then invamount else lag(balance) over(partition by invid order by paydate) end as invamount
,payAmount, paydate, balance
FROM CTE_Balance
invid
invamount
payAmount
paydate
balance
1
543
20
2016-01-01
523
1
523
35
2016-01-07
488
1
488
78
2016-01-13
410
1
410
52
2016-01-25
358
2
749
40
2016-01-03
709
2
709
54
2016-01-15
655
2
655
63
2016-01-17
592
2
592
59
2016-01-28
533
I am trying to figure out how to write a query that will give me the correct historical data between dates. But only using sql. I know it is possible coding a loop, but I'm not sure if this is possible in a SQL query. Dates: DD/MM/YYYY
An Example of Data
ID
Points
DATE
1
10
01/01/2018
1
20
02/01/2019
1
25
03/01/2020
1
10
04/01/2021
With a simple query
SELECT ID, Points, MIN(Date), MAX(Date)
FROM table
GROUP BY ID,POINTS
The Min date for 10 points would be 01/01/2018, and the Max Date would be 04/01/2021. Which would be wrong in this instance. As It should be:
ID
Points
Min DATE
Max DATE
1
10
01/01/2018
01/01/2019
1
20
02/01/2019
02/01/2020
1
25
03/01/2020
03/01/2021
1
10
04/01/2021
04/01/2021
I was thinking of using LAG, but need some ideas here. What I haven't told you is there is a record per day. So I would need to group until a change of points. This is to create a view from the data that I already have.
It looks like - for your sample data set - the following lead should suffice:
select id, points, date as MinDate,
IsNull(DateAdd(day, -1, Lead(Date,1) over(partition by Id order by Date)), Date) as MaxDate
from t
Example Fiddle
I'm guessing you want the MAX date to be 1 day before the next MIN date.
And you can use the window function LEAD to get the next MIN date.
And if you group also by the year, then the date ranges match the expected result.
SELECT ID, Points
, MIN([Date]) AS [Min Date]
, COALESCE(DATEADD(day, -1, LEAD(MIN([Date])) OVER (PARTITION BY ID ORDER BY MIN([Date]))), MAX([Date])) AS [Max Date]
FROM your_table
GROUP BY ID, Points, YEAR([Date]);
ID
Points
Min Date
Max Date
1
10
2018-01-01
2019-01-01
1
20
2019-01-02
2020-01-02
1
25
2020-01-03
2021-01-03
1
10
2021-01-04
2021-01-04
Test on db<>fiddle here
We can do this by creating two tables one with the minimum and one with the maximum date for each grouping and then combining them
CREATE TABLE dataa(
id INT,
points INT,
ddate DATE);
INSERT INTO dataa values(1 , 10 ,'2018-10-01');
INSERT INTO dataa values(1 , 20 ,'2019-01-02');
INSERT INTO dataa values(1 , 25 ,'2020-01-03');
INSERT INTO dataa values(1 , 10 ,'2021-01-04');
SELECT
mi.id, mi.points,mi.date minDate, ma.date maxDate
FROM
(select id, points, min(ddate) date from dataa group by id,points) mi
JOIN
(select id, points, max(ddate) date from dataa group by id,points) ma
ON
mi.id = ma.id
AND
mi.points = ma.points;
DROP TABLE dataa;
this gives the following output
+------+--------+------------+------------+
| id | points | minDate | maxDate |
+------+--------+------------+------------+
| 1 | 10 | 2018-10-01 | 2021-01-04 |
| 1 | 20 | 2019-01-02 | 2019-01-02 |
| 1 | 25 | 2020-01-03 | 2020-01-03 |
+------+--------+------------+------------+
I've used the default date formatting. This could be modified if you wish.
*** See my other answer, as I don't think this answer is correct after reexamining the OPs question. Leaving ths answer in place, in case it has any value.
As I understand the problem consecutive daily values with the same value for a given ID may be ignored. This can be done by examining the prior value using the LAG() function and excluding records where the current value is unchanged from the prior.
From the remaining records, the LEAD() function can be used to look ahead to the next included record to extract the date where this value is superseded. Max Date is then calculated as one day prior.
Below is an example that includes expanded test data to cover multiple IDs and repeated Points values.
DECLARE #Data TABLE (Id INT, Points INT, Date DATE)
INSERT #Data
VALUES
(1, 10, '2018-01-01'), -- Start
(1, 20, '2019-01-02'), -- Updated
(1, 25, '2020-01-03'), -- Updated
(1, 10, '2021-01-04'), -- Updated
(2, 10, '2022-01-01'), -- Start
(2, 20, '2022-02-01'), -- Updated
(2, 20, '2022-03-01'), -- No change
(2, 20, '2022-04-01'), -- No change
(2, 20, '2022-05-01'), -- No change
(2, 25, '2022-06-01'), -- Updated
(2, 25, '2022-07-01'), -- No change
(2, 20, '2022-08-01'), -- Updated
(2, 25, '2022-09-08'), -- Updated
(2, 10, '2022-10-09'), -- Updated
(3, 10, '2022-01-01'), -- Start
(3, 10, '2022-01-02'), -- No change
(3, 20, '2022-01-03'), -- Updated
(3, 20, '2022-01-04'), -- No change
(3, 20, '2022-01-05'), -- No change
(3, 10, '2022-01-06'), -- Updated
(3, 10, '2022-01-07'); -- No change
WITH CTE AS (
SELECT *, PriorPoints = LAG(Points) OVER (PARTITION BY Id ORDER BY Date)
FROM #Data
)
SELECT ID, Points, MinDate = Date,
MaxDate = DATEADD(day, -1, (LEAD(Date) OVER (PARTITION BY Id ORDER BY Date)))
FROM CTE
WHERE (PriorPoints <> Points OR PriorPoints IS NULL) -- Exclude unchanged
ORDER BY Id, Date
Results:
ID
Points
MinDate
MaxDate
1
10
2018-01-01
2019-01-01
1
20
2019-01-02
2020-01-02
1
25
2020-01-03
2021-01-03
1
10
2021-01-04
null
2
10
2022-01-01
2022-01-31
2
20
2022-02-01
2022-05-31
2
25
2022-06-01
2022-07-31
2
20
2022-08-01
2022-09-07
2
25
2022-09-08
2022-10-08
2
10
2022-10-09
null
3
10
2022-01-01
2022-01-02
3
20
2022-01-03
2022-01-05
3
10
2022-01-06
null
db<>fiddle
For the last value for a given ID, the calculated MaxDate is NULL indicating no upper bound to the date range. If you really want MaxDate = MinDate for this case, you can add ISNULL( ..., Date).
(I am adding this as an alternative (and simpler) interpretation of the OP's question.)
Problem restatement: Given a collection if IDs, Dates, and Points values, a group is defined as any consecutive sequence of the same Points value for a given ID and ascending dates. For each such group, calculate the min and max dates.
The start of such a group can be identified as a row where the Points value changes from the preceding value, or if there is no preceding value for a given ID. If we first tag such rows (NewGroup = 1), we can then assign group numbers based on a count of preceding tagged rows (including the current row). Once we have assigned group numbers, it is then a simple matter to apply a group and aggregate operation.
Below is a sample that includes some additional test data to show multiple IDs and repeating values.
DECLARE #Data TABLE (Id INT, Points INT, Date DATE)
INSERT #Data
VALUES
(1, 10, '2018-01-01'), -- Start
(1, 20, '2019-01-02'), -- Updated
(1, 25, '2020-01-03'), -- Updated
(1, 10, '2021-01-04'), -- Updated
(2, 10, '2022-01-01'), -- Start
(2, 20, '2022-02-01'), -- Updated
(2, 20, '2022-03-01'), -- No change
(2, 20, '2022-04-01'), -- No change
(2, 20, '2022-05-01'), -- No change
(2, 25, '2022-06-01'), -- Updated
(2, 25, '2022-07-01'), -- No change
(2, 20, '2022-08-01'), -- Updated
(2, 25, '2022-09-08'), -- Updated
(2, 10, '2022-10-09'), -- Updated
(3, 10, '2022-01-01'), -- Start
(3, 10, '2022-01-02'), -- No change
(3, 20, '2022-01-03'), -- Updated
(3, 20, '2022-01-04'), -- No change
(3, 20, '2022-01-05'), -- No change
(3, 10, '2022-01-06'), -- Updated
(3, 10, '2022-01-07'); -- No change
WITH CTE AS (
SELECT *,
PriorPoints = LAG(Points) OVER (PARTITION BY Id ORDER BY Date)
FROM #Data
)
, CTE2 AS (
SELECT *,
NewGroup = CASE WHEN (PriorPoints <> Points OR PriorPoints IS NULL)
THEN 1 ELSE 0 END
FROM CTE
)
, CTE3 AS (
SELECT *, GroupNo = SUM(NewGroup) OVER(
PARTITION BY ID
ORDER BY Date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
FROM CTE2
)
SELECT Id, Points, MinDate = MIN(Date), MaxDate = MAX(Date)
FROM CTE3
GROUP BY Id, GroupNo, Points
ORDER BY Id, GroupNo
Results:
Id
Points
MinDate
MaxDate
1
10
2018-01-01
2018-01-01
1
20
2019-01-02
2019-01-02
1
25
2020-01-03
2020-01-03
1
10
2021-01-04
2021-01-04
2
10
2022-01-01
2022-01-01
2
20
2022-02-01
2022-05-01
2
25
2022-06-01
2022-07-01
2
20
2022-08-01
2022-08-01
2
25
2022-09-08
2022-09-08
2
10
2022-10-09
2022-10-09
3
10
2022-01-01
2022-01-02
3
20
2022-01-03
2022-01-05
3
10
2022-01-06
2022-01-07
To see the intermediate results, replace the final select with SELECT * FROM CTE3 ORDER BY Id, Date.
If you wish to treat gaps in dates as group criteria, add a PriorDate calculation to CTE and add OR Date <> PriorDate to the NewGroup condition.
db<>fiddle
Caution: In your original post, you state that "this is to create a view". Beware that if the above logic is included in a view, the entire result may be recalculated every time the view is accessed, regardless of any ID or date criteria applied. It might make more sense to use the above to populate and periodically refresh a historic roll-up data table for efficient access. Another alternative is to make a stored procedure with appropriate parameters that could filter that data before feeding it into the above.
Im having an issue where im using recursive subquery factoring to use the previous rows values as my next rows values. Problem is i need to stop using the previous rows values if my product_key changes.
CREATE TABLE MAKE_IT_WORK
(
PRODUCT_KEY NUMBER,
WEEK NUMBER,
OPENING_STOCK NUMBER,
INTAKE NUMBER,
SALES NUMBER,
CLOSING_STOCK NUMBER,
FORWARD_COVER NUMBER
);
Insert into MAKE_IT_WORK (PRODUCT_KEY, WEEK)
Values (1, 1);
Insert into MAKE_IT_WORK (PRODUCT_KEY, WEEK, INTAKE, SALES)
Values (1, 2, 1000, 80);
Insert into MAKE_IT_WORK (PRODUCT_KEY, WEEK, SALES)
Values (1, 3, 70);
Insert into MAKE_IT_WORK (PRODUCT_KEY, WEEK, SALES)
Values (1, 4, 90);
Insert into MAKE_IT_WORK (PRODUCT_KEY, WEEK, SALES)
Values (2, 1, 0);
Insert into MAKE_IT_WORK (PRODUCT_KEY, WEEK, INTAKE, SALES)
Values (2, 2, 6000, 500);
Insert into MAKE_IT_WORK (PRODUCT_KEY, WEEK, SALES)
Values (2, 3, 70);
Insert into MAKE_IT_WORK (PRODUCT_KEY, WEEK, SALES)
Values (2, 4, 350);
CURRENT QUERY
with master
as(select product_key,week,opening_stock ,intake,sales,closing_stock,forward_cover,row_number()over( order by 1) lvl,product_key-1 pkey
from make_it_work),
bdw_knows_best(product_key,week,opening_stock,intake,sales,closing_stock,forward_cover,lvl,pkey) as
(select product_key
,week
,opening_stock
,nvl(intake,0)intake
,sales
,closing_stock
,forward_cover
,lvl
,pkey
from master
where lvl = 1
union all
select a.product_key
,a.week
,case when b.closing_stock < 0 then 0
else b.closing_stock
end opening_stock
,nvl(a.intake,0)intake
,nvl(a.sales,0) sales
,case when nvl(b.closing_stock,0) + nvl(a.intake,0) - nvl(a.sales,0) < 0 THEN 0
else nvl(b.closing_stock,0) + nvl(a.intake,0) - nvl(a.sales,0)
end closing_stock
,a.forward_cover
,b.lvl +1
,a.pkey pkey
from master a,
bdw_knows_best b
where a.lvl = b.lvl +1
)
select product_key,week,opening_stock,intake,sales,closing_stock,forward_cover,lvl,pkey from bdw_knows_best;
REQUIRED
When the product key changes from 1 to 2, I need to use the values from Product_Key 2 and not the last records from Product_Key 1. I need to somehow group the by Product_Key buckets(so to speak).
Any help or ideas would be highly appreaciated
You don't need a recursive CTE. Window functions (the OVER clause) will produce the result you want. For example:
select product_key, week, opening_stock, intake, sales,
coalesce(opening_stock, 0)
+ sum(intake) over(partition by product_key order by week)
- sum(sales) over(partition by product_key order by week)
as closing_stock
from make_it_work
order by product_key, week;
Result:
PRODUCT_KEY WEEK OPENING_STOCK INTAKE SALES CLOSING_STOCK
------------ ----- -------------- ------- ------ -------------
1 1
1 2 1000 80 920
1 3 70 850
1 4 90 760
2 1 0
2 2 6000 500 5500
2 3 70 5430
2 4 350 5080
See running example at db<>fiddle.
I have a table with historical stocks prices for hundreds of stocks. I need to extract only those stocks that reached $10 or greater for the first time.
Stock
Price
Date
AAA
9
2021-10-01
AAA
10
2021-10-02
AAA
8
2021-10-03
AAA
10
2021-10-04
BBB
9
2021-10-01
BBB
11
2021-10-02
BBB
12
2021-10-03
Is there a way to count how many times each stock hit >= 10 in order to pull only those where count = 1 (in this case it would be stock BBB considering it never reached 10 in the past)?
Since I couldn't figure how to create count I've tried the below manipulations with min/max dates but this looks like a bit awkward approach. Any idea of a simpler solution?
with query1 as (
select Stock, min(date) as min_greater10_dt
from t
where Price >= 10
group by Stock
), query2 as (
select Stock, max(date) as max_greater10_dt
from t
where Price >= 10
group by Stock
)
select Stock
from t a
join query1 b on b.Stock = a.Stock
join query2 c on c.Stock = a.Stock
where not(a.Price < 10 and a.Date between b.min_greater10_dt and c.max_greater10_dt)
This is a type of gaps-and-islands problem which can be solved as follows:
detect the change from < 10 to >= 10 using a lagged price
count the number of such changes
filter in only stock where this has happened exactly once
and take the first row since you only want the stock (you could group by here but a row number allows you to select the entire row should you wish to).
declare #Table table (Stock varchar(3), Price money, [Date] date);
insert into #Table (Stock, Price, [Date])
values
('AAA', 9, '2021-10-01'),
('AAA', 10, '2021-10-02'),
('AAA', 8, '2021-10-03'),
('AAA', 10, '2021-10-04'),
('BBB', 9, '2021-10-01'),
('BBB', 11, '2021-10-02'),
('BBB', 12, '2021-10-03');
with cte1 as (
select Stock, Price, [Date]
, row_number() over (partition by Stock, case when Price >= 10 then 1 else 0 end order by [Date] asc) rn
, lag(Price,1,0) over (partition by Stock order by [Date] asc) LaggedStock
from #Table
), cte2 as (
select Stock, Price, [Date], rn, LaggedStock
, sum(case when Price >= 10 and LaggedStock < 10 then 1 else 0 end) over (partition by Stock) StockOver10
from cte1
)
select Stock
--, Price, [Date], rn, LaggedStock, StockOver10 -- debug
from cte2
where Price >= 10
and StockOver10 = 1 and rn = 1;
Returns:
Stock
BBB
Note: providing DDL+DML as show above makes it much easier of people to assist.