Calculate balance amount per prodcuct in redshift - sql

DB-Fiddle
CREATE TABLE inventory (
id SERIAL PRIMARY KEY,
stock_date DATE,
product VARCHAR(255),
inbound_quantity INT,
outbound_quantity INT
);
INSERT INTO inventory
(stock_date, product, inbound_quantity, outbound_quantity
)
VALUES
('2020-01-01', 'Product_A', '900', '0'),
('2020-01-02', 'Product_A', '0', '300'),
('2020-01-03', 'Product_A', '400', '250'),
('2020-01-04', 'Product_A', '0', '100'),
('2020-01-05', 'Product_A', '700', '500'),
('2020-01-03', 'Product_B', '850', '0'),
('2020-01-08', 'Product_B', '100', '120'),
('2020-02-20', 'Product_B', '0', '360'),
('2020-02-25', 'Product_B', '410', '230'),
Expected Result:
stock_date
product
inbound_quantity
outbound_quantity
balance
2020-01-01
Product_A
900
0
900
2020-01-02
Product_A
0
300
600
2020-01-03
Product_A
400
250
750
2020-01-04
Product_A
0
100
650
2020-01-05
Product_A
700
500
850
2020-01-03
Product_B
740
0
740
2020-01-08
Product_B
100
120
720
2020-02-20
Product_B
0
360
360
2020-02-25
Product_B
410
230
540
2020-03-09
Product_B
290
0
830
I want to calculate the balance per product.
So far I have been able to develop this query below but it does not work.
I get error window "product" does not exist.
SELECT
iv.stock_date AS stock_date,
iv.product AS product,
iv.inbound_quantity AS inbound_quantity,
iv.outbound_quantity AS outbound_quantity,
SUM(iv.inbound_quantity - iv.outbound_quantity) OVER
(product ORDER BY stock_date ASC ROWS UNBOUNDED PRECEDING) AS Balance
FROM inventory iv
GROUP BY 1,2,3,4
ORDER BY 2,1;
How do I need to modify the query to make it work?

You are almost there
You should add partition by in front of product
SELECT
iv.stock_date AS stock_date,
iv.product AS product,
iv.inbound_quantity AS inbound_quantity,
iv.outbound_quantity AS outbound_quantity,
SUM(iv.inbound_quantity - iv.outbound_quantity) OVER
(partition by product ORDER BY stock_date ASC ROWS UNBOUNDED PRECEDING) AS Balance
FROM inventory iv
GROUP BY 1,2,3,4
ORDER BY 2,1;
So, it should be like this

The error message "product" does not exist is because you are trying to reference the column "product" in the OVER clause, but it is not included in the GROUP BY clause.
To fix this issue, you will need to include the "product" column in the GROUP BY clause, and also add a partition by clause to the OVER clause, so that the SUM function will calculate the balance per product.
Try this query:
SELECT
iv.stock_date AS stock_date,
iv.product AS product,
iv.inbound_quantity AS inbound_quantity,
iv.outbound_quantity AS outbound_quantity,
SUM(iv.inbound_quantity - iv.outbound_quantity) OVER
(PARTITION BY product ORDER BY stock_date ASC ROWS UNBOUNDED PRECEDING) AS Balance
FROM inventory iv
GROUP BY 1,2,3,4
ORDER BY 2,1;
This way, SUM function will only sum the inbound_quantity - outbound_quantity for each product, and not for all the products.
The query will return the expected result.

Related

Calculate balance amount in redshift

DB-Fiddle
CREATE TABLE inventory (
id SERIAL PRIMARY KEY,
stock_date DATE,
inbound_quantity INT,
outbound_quantity INT
);
INSERT INTO inventory
(stock_date, inbound_quantity, outbound_quantity
)
VALUES
('2020-01-01', '900', '0'),
('2020-01-02', '0', '300'),
('2020-01-03', '400', '250'),
('2020-01-04', '0', '100'),
('2020-01-05', '700', '500');
Expected Output:
stock_date
inbound_quantity
outbound_quantity
balance
2020-01-01
900
0
900
2020-01-02
0
300
600
2020-01-03
400
250
750
2020-01-04
0
100
650
2020-01-05
700
500
850
Query:
SELECT
iv.stock_date AS stock_date,
iv.inbound_quantity AS inbound_quantity,
iv.outbound_quantity AS outbound_quantity,
SUM(iv.inbound_quantity - iv.outbound_quantity) OVER (ORDER BY stock_date ASC) AS Balance
FROM inventory iv
GROUP BY 1,2,3
ORDER BY 1;
With the above query I am able to calculate the balance of inbound_quantity and outbound_quantity in PostgresSQL.
However, when I run the same query in Amazon-Redshift I get this error:
Amazon Invalid operation: Aggregate window functions with an ORDER BY clause require a frame clause;
1 statement failed.
How do I need to change the query to also make it work in Redshift?
As the error speaks, you need to add the frame specification clause, namely the ROWS UNBOUNDED PRECEDING into your window function.
SELECT iv.stock_date AS stock_date,
iv.inbound_quantity AS inbound_quantity,
iv.outbound_quantity AS outbound_quantity,
SUM(iv.inbound_quantity - iv.outbound_quantity) OVER (
ORDER BY stock_date ASC
ROWS UNBOUNDED PRECEDING
) AS Balance
FROM inventory iv
GROUP BY 1,2,3
ORDER BY 1;

Generate billing balance depending on the number of payments made

Below is the table I have created and inserted values in it:
CREATE TABLE Invoices
(
InvID int,
InvAmount int
)
GO
INSERT INTO Invoices
VALUES (1, 543), (2, 749)
CREATE TABLE payments
(
PayID int IDENTITY (1, 1),
InvID int,
PayAmount int,
PayDate date
)
INSERT INTO payments
VALUES (1, 20, '2016-01-01'),
(1, 35, '2016-01-07'),
(1, 78, '2016-01-13'),
(1, 52, '2016-01-25'),
(2, 40, '2016-01-03'),
(2, 54, '2016-01-15'),
(2, 63, '2016-01-17'),
(2, 59, '2016-01-28')
SELECT * FROM Invoices
SELECT * FROM payments
As shown in the screenshot above, the Invoice table specifies various customer billings (the first billing totals 543, the second billing totals 749).
As shown in the screenshot above, the payments table specifies the various payments the customer made for each of the billings. For example, one can see that on January 1st the customer paid 20 USD out of billing no. 1 (which totals 543 USD), and on January 3rd the customer paid 40 USD out of billing no. 2, (which totals 749 USD).
Now the question is:
Write a query that displays the billing balance, based on the number of payments made so far.
The query result should exactly look like the screenshot below:
This is what I have tried:
SELECT
payments.InvID,
InvAmount - SUM(PayAmount) OVER (PARTITION BY payments.InvID ORDER BY PayID
ROWS BETWEEN UNBOUNDED PRECEDING AND 0 FOLLOWING) AS 'InvAmount',
PayDate, PayAmount,
InvAmount - SUM(PayAmount) OVER (PARTITION BY payments.InvID ORDER BY PayID) AS 'Balance'
FROM
Invoices
JOIN
payments ON payments.InvID = Invoices.InvID
After running the query, I got the following result which is shown below:
As you can see from the screenshot above, I nearly got the result I wanted.
The only problem is that InvAmount is exactly returning the same row values as Balance. I am not able to retain the starting row values of InvAmount which are 543 (InvID = 1) and 749 (InvID = 2) respectively.
How can this issue be solved?
You can add back the PayAmount in the calculation
InvAmount
+ PayAmount
- SUM(PayAmount) OVER (PARTITION BY payments.InvID
ORDER BY PayID
ROWS BETWEEN UNBOUNDED PRECEDING
AND 0 FOLLOWING) AS InvAmount
Or use BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING. But you need to handle NULL value for the very first row
InvAmount
- ISNULL(SUM(PayAmount) OVER (PARTITION BY payments.InvID
ORDER BY PayID
ROWS BETWEEN UNBOUNDED PRECEDING
AND 1 PRECEDING), 0) AS InvAmount
db>fiddle demo
You can use PARTITION BY clause for invoice identifier and then deduct the pay amount from the invoice amount as given below
;WITH CTE_Balance as
(
SELECT i.InvID, i.InvAmount, p.PayAmount, p.PayDate
, InvAmount - sum(payamount) over (partition by p.invid order by paydate rows between unbounded preceding and current row) as balance
, ROW_NUMBER() over(partition by p.invid order by p.paydate) as rnk
FROM payments as p
inner join Invoices as i
on i.InvID = p.InvID
)
SELECT invid, case when rnk =1 then invamount else lag(balance) over(partition by invid order by paydate) end as invamount
,payAmount, paydate, balance
FROM CTE_Balance
invid
invamount
payAmount
paydate
balance
1
543
20
2016-01-01
523
1
523
35
2016-01-07
488
1
488
78
2016-01-13
410
1
410
52
2016-01-25
358
2
749
40
2016-01-03
709
2
709
54
2016-01-15
655
2
655
63
2016-01-17
592
2
592
59
2016-01-28
533

Running total with if clause or do while

Consider the following table with 3 columns.
Use this to create a SQL query to list the top products by revenue that make up 25% of the total revenue in 2020.
(i.e. If total revenue is 1000 then list of top products that account for <= 250)
Table ProductRevenue:
Date_DD ... date(YYYY-MM-DD)
Product_Name ... varchar(250)
Revenue ... decimal(10,2)
Sample data:
Date_DD Product_Name Revenue
-------------------------------------
2020-11-30 a 100
2020-10-02 b 100
2020-07-07 c 100
2020-04-04 d 100
2020-05-05 f 50
2020-06-06 g 120
2020-05-30 h 90
2020-11-13 k 120
2020-01-30 l 120
I used that code but don't know how to use where clause . Anyone can help?
SELECT
product_name, revenue,
SUM(revenue) OVER (ORDER BY revenue DESC, product_name) AS running _total
FROM
TABLE_PRODUCT_REVENUE
new code
select product_name, revenue, running_total from
(SELECT product_name, revenue, SUM(revenue) OVER ( ORDER BY revenue DESC, product_name) AS running_total
FROM TABLE_PRODUCT_REVENUE ) o
where running_total<(select max(running_total) from
(SELECT product_name, revenue, SUM(revenue) OVER ( ORDER BY revenue DESC, product_name) AS running_total
FROM TABLE_PRODUCT_REVENUE ) o )*0.25
group by product_name, revenue, running_total
order by running_total
I just need to know where can i add where clause where YEAR([Date_DD])=2000 anyone can help?
The question was not very descriptive; however, below might help you narrow down the issue.
Below will show up you the running total for the Product.
SELECT
product_name, revenue,
SUM(revenue) OVER (partition by product_name ORDER BY revenue DESC, product_name) AS running _total
FROM
TABLE-PRODUCT_REVENUE
below would give the result if the product total is more significant than x amount
select
*,case when running _total >=1000 then 'top selling product' else null end
(
SELECT
product_name, revenue,
SUM(revenue) OVER (partition by product_name ORDER BY revenue DESC, product_name) AS running _total
FROM
TABLE-PRODUCT_REVENUE
)t
where running_total >= xxx amount

Select values that have a minus difference between two timestamps and show their complete history in the table

DB-Fiddle
CREATE TABLE logistics (
id SERIAL PRIMARY KEY,
time_stamp DATE,
product VARCHAR(255),
quantity INT
);
INSERT INTO logistics
(time_stamp, product, quantity)
VALUES
('2020-01-14', 'Product_A', '100'),
('2020-01-14', 'Product_B', '300'),
('2020-01-15', 'Product_B', '400'),
('2020-01-15', 'Product_C', '350'),
('2020-01-16', 'Product_B', '530'),
('2020-01-16', 'Product_C', '250'),
('2020-01-16', 'Product_D', '670'),
('2020-01-17', 'Product_C', '380'),
('2020-01-17', 'Product_D', '980'),
('2020-01-17', 'Product_E', '700'),
('2020-01-17', 'Product_F', '450');
Expected Result
time_stamp | product | difference |
------------|-----------------|-----------------|---------
2020-01-15 | Product_C | 350 |
2020-01-16 | Product_C | -100 |
2020-01-17 | Product_C | 130 |
I want to do the following two things:
Extract the products from the table which have decreased their quantity from timestamp to timestamp
Display the history of those products over all timestamps.
With the below query I am able to do Step 1 but I am wondering how I need to modify it to also include the history of the selected products.
SELECT
t1.time_stamp AS time_stamp,
t1.product AS product,
SUM(t1.difference) AS difference
FROM
(SELECT
l.time_stamp AS time_stamp,
l.product AS product,
Coalesce(l.quantity-LAG(l.quantity) OVER (Partition by l.product ORDER BY l.product, l.time_stamp), l.quantity) AS difference
FROM logistics l
ORDER BY 1,2) t1
WHERE t1.difference < 0
GROUP BY 1,2
ORDER BY 1,2;
Do you have any idea?
You can use a MAX OVER to calculate a flag per product.
Then filter on the flag.
SELECT q2.time_stamp, q2.product, q2.difference
FROM (
SELECT q1.*
, MAX(CASE WHEN q1.quantity < q1.prev_quantity THEN 1 ELSE 0 END)
OVER (PARTITION BY q1.product) AS has_difference
, (q1.quantity - coalesce(q1.prev_quantity, 0)) AS difference
FROM (
SELECT l.product, l.time_stamp, l.quantity
, LAG(l.quantity) OVER (PARTITION BY l.product ORDER BY l.time_stamp) AS prev_quantity
FROM logistics l
) AS q1
) q2
WHERE q2.has_difference = 1
ORDER BY q2.product, q2.time_stamp;
time_stamp | product | difference
:--------- | :-------- | ---------:
2020-01-15 | Product_C | 350
2020-01-16 | Product_C | -100
2020-01-17 | Product_C | 130
db<>fiddle here
Use EXISTS:
WITH cte AS (
SELECT time_stamp, product,
quantity - LAG(quantity, 1, 0) OVER (PARTITION BY product ORDER BY time_stamp) difference
FROM logistics
)
SELECT c1.*
FROM cte c1
WHERE EXISTS (
SELECT 1
FROM cte c2
WHERE c2.product = c1.product AND c2.difference < 0
)
ORDER BY c1.product, c1.time_stamp;
See the demo.
DB-Fiddle
WITH cte AS
(SELECT
l.time_stamp AS time_stamp,
l.product AS product,
Coalesce(l.quantity-LAG(l.quantity) OVER (Partition by l.product ORDER BY l.product, l.time_stamp), l.quantity) AS difference
FROM logistics l
ORDER BY 1,2)
SELECT
t1.time_stamp AS time_stamp,
t1.product AS product,
SUM(t1.difference) AS difference
FROM cte t1
WHERE EXISTS
(SELECT
t2.product AS product
FROM cte t2
WHERE t2.difference < 0
AND t2.product = t1.product
GROUP BY 1
ORDER BY 1)
GROUP BY 1,2
ORDER BY 1,2;

Display only certain columns in results despite case statement in query

DB-Fiddle
CREATE TABLE inventory (
id SERIAL PRIMARY KEY,
product VARCHAR,
quantity DECIMAL,
avg_price DECIMAL,
normal_price DECIMAL
);
INSERT INTO inventory
(product, quantity, avg_price, normal_price)
VALUES
('product_01', '800', '10', '10'),
('product_01', '300', '20', '90'),
('product_01', '200', '0', '50'),
('product_01', '500', '30', '80'),
('product_01', '600', '0', '60'),
('product_01', '400', '50', '40');
Expected Result:
product | quantity | final_price |
-------------|--------------|----------------|--------------
product_01 | 800 | 10 |
product_02 | 300 | 20 |
product_03 | 200 | 50 |
product_04 | 500 | 30 |
product_05 | 600 | 60 |
product_06 | 400 | 50 |
I only want to display the column quantity and final_price.
However, I have to use a CASE statement in my query and the syntax from postgresSQL is forcing me to add the column avg_price and normal_price to the query in order to make the CASE statement work:
SELECT
iv.product AS product,
iv.avg_price AS avg_price,
iv.normal_price AS normal_price,
SUM(iv.quantity) AS quantity,
(CASE WHEN iv.avg_price = 0 THEN iv.normal_price ELSE iv.avg_price END) AS final_price
FROM inventory iv
GROUP BY 1,2,3
ORDER BY 1;
Not sure if this is possible in postgresSQL but is there a way to only display the two columns as in the expected result?
I would suggest aggregating by the expression itself:
SELECT iv.product AS product,
SUM(iv.quantity) AS quantity,
(CASE WHEN iv.avg_price = 0 THEN iv.normal_price ELSE iv.avg_price END) AS final_price
FROM inventory iv
GROUP BY iv.product, final_price
ORDER BY 1;
Use explicit GROUP BY iv.product, iv.avg_price, iv.normal_price instead of GROUP BY 1, 2, 3:
SELECT
iv.product AS product,
SUM(iv.quantity) AS quantity,
(CASE WHEN iv.avg_price = 0 THEN iv.normal_price ELSE iv.avg_price END) AS final_price
FROM inventory iv
GROUP BY iv.product, iv.avg_price, iv.normal_price
ORDER BY 1;