Aggregate quantity columns per distinct date in table sql - sql

I want to sum quantity column from first date in table (2016-02-17 in this table) until per each distinct date in the table. Result relation should contains sum of quantities per each distinct date in table.
how can I write a query for this in sql server?
ID| quantity | date
---+----------+-----
18 | 6 | 2016-02-17 00:00:00.000
19 | 6 | 2016-02-17 00:00:00.000
18 | 4 | 2016-02-17 00:00:00.000
19 | 3 | 2016-02-18 00:00:00.000
18 | 1 | 2016-02-18 00:00:00.000
19 | 5 | 2016-02-18 00:00:00.000
18 | 6 | 2016-02-19 00:00:00.000
19 | 7 | 2016-02-19 00:00:00.000
18 | 8 | 2016-02-19 00:00:00.000
19 | 9 | 2016-02-19 00:00:00.000
Expected output:
| Date | quantity |
|------------|----------|
| 2016-02-17 | 16 |
| 2016-02-18 | 25 |
| 2016-02-19 | 55 |

Aggregate function SUM with GROUP BY will give you the sum values for the Distinct dates.
SELECT Date,
SUM(quantity) OVER(ORDER BY Date) quantity
FROM(
SELECT DATE, SUM(quantity) quantity
FROM Your_Table
GROUP BY DATE
)A
Check the SQL Fiddle for reference.
If you want the result for ID specific, use this. The PARTITION will make the difference.
SELECT Id, Date,
SUM(quantity) OVER(PARTITION BY ID ORDER BY Date) quantity
FROM(
SELECT Id, DATE, SUM(quantity) quantity
FROM A
GROUP BY Id, DATE
)A

You do not need a subquery or CTE to use window functions with aggregation:
SELECT DATE, SUM(quantity) as day_quantity,
SUM(SUM(quantity)) OVER (ORDER BY DATE) as running_quantity
FROM Your_Table
GROUP BY DATE
ORDER BY DATE;
If you want the results ordered by date (as implied by your result set), you should include an explicit ORDER BY.

Another approach is
WITH
b as (
Select my_date,
SUM(quantity) Over(order by my_date rows between unbounded preceding and current row) running_total
from main_table
)
SELECT my_date, max(running_total) running_total
from b group by dt

Related

How to query increased sales every day a month in SQL Redshift?

Can some 1 help me with SQL redshift query to get the result the way mentioned below
My table:
SalesDate | Amount($)
2022-03-01 | 4
2022-03-01 | 5
2022-03-02 | 3
2022-03-02 | 10
2022-03-02 | 12
2022-03-03 | 1
etc..
I want to have an increased sales table group by SalesDate :
SalesDate | Amount($)
2022-03-01 | 9
2022-03-02 | 34
etc...
Currently, I tried to use this query but doesn't work:
select distinct salesdate::date as date_number
, sum(*) over (order by salesdate::date) asc ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as amount
from mytable
where salesdate >= '2022-03-01'
So I received the result not as my wanted. It increase but not as my wanted:
SalesDate | Amount($)
2022-03-01 | 4
2022-03-01 | 9
2022-03-02 | 12
2022-03-02 | 22
2022-03-02 | 34
You can try to use group by with aggregate function in a subquery, before use window function.
SELECT date_number,
sum(amount) over (order by date_number) totalAmount
FROM (
select salesdate::date as date_number,
sum(Amount) as amount
from mytable
where salesdate >= '2022-03-01'
GROUP BY salesdate::date
) t1

How do I summarize sales data in SQL by month for last 24months?

I have big number of rows with sales for different products on various days.
I want to retrieve the sum for each product and per month. For the last 24months.
How do I write a WHERE function showing the last 24 months (based on latest date in table not actual date)?
How is that summarized and shown by month instead of individual days like 2018-01-24?
**Sample Data Table**
| SalesDate | Product | SLSqty |
| 2018-01-24 | Product A | 25 |
| 2019-06-10 | Product B | 10 |
| 2019-10-07 | Product C | 4 |
| 2020-03-05 | Product A | 20 |
| 2021-09-01 | Product A | 50 |
| 2021-09-01 | Product B | 10 |
| 2021-09-02 | Product C | 3 |
| 2021-09-04 | Product A | 50 |
| 2021-09-07 | Product B | 10 |
**Expected Result**
| SalesMONTH | Product | SLSqty |
| 2019-10-31 | Product C | 4 |
| 2020-03-31 | Product A | 20 |
| 2021-09-30 | Product A | 100|
| 2021-09-30 | Product A | 20 |
| 2021-09-30 | Product B | 3 |
I would make a parameter that stores the value of the latest date in your table. Then you can impute the parameter in you WHERE clause.
IF OBJECT_ID('TEMPDB..#TEMP') IS NOT NULL
DROP TABLE #TEMP
CREATE TABLE #TEMP(
[SalesDate] DATE
,[product] NVARCHAR(20)
,[SLSqty] INT
)
INSERT INTO #TEMP([SalesDate],[product],[SLSqty])
VALUES('2018-01-24','Product A',25)
,('2019-06-10','Product B',10)
,('2019-10-07','Product C',4 )
,('2020-03-05','Product A',20)
,('2021-09-01','Product A',50)
,('2021-09-01','Product B',10)
,('2021-09-02','Product C',3 )
,('2021-09-04','Product A',50)
,('2021-09-07','Product B',10)
DECLARE #DATEVAR AS DATE = (SELECT MAX(#TEMP.SalesDate) FROM #TEMP)
The last line declares the variable. If you select #DATEVAR, you get the output of a single date defined by the select statement:
Then you impute it into a where clause. Since you want 24 months prior to the latest date, I would use a DATEDIFF(MONTH,,) function in your where clause. It outputs an integer of months and you simply constrain it to be 24 months or less.
SELECT #TEMP.SalesDate
,#TEMP.product
,#TEMP.SLSqty
,DATEDIFF(MONTH,#TEMP.SalesDate,#DATEVAR) [# of months Diff]
FROM #TEMP
WHERE DATEDIFF(MONTH,#TEMP.SalesDate,#DATEVAR) <= 24
OUTPUT:
Now you have to aggregate the sales grouped by the year-month and product.
I compute year-month by calculating an integer like 202109 (Sept. 2021)
SELECT --#TEMP.SalesDate --(YOU HAVE TO TAKE THIS OUT FOR THE GROUP BY)
YEAR(#TEMP.SalesDate)*100+MONTH(#TEMP.SalesDate) [year-month for GROUP BY]
,#TEMP.product
,SUM(#TEMP.SLSqty) SLSqty
-- ,DATEDIFF(MONTH,#TEMP.SalesDate,#DATEVAR) [# of months Diff] --(YOU HAVE TO TAKE THIS OUT FOR THE GROUP BY)
FROM #TEMP
WHERE DATEDIFF(MONTH,#TEMP.SalesDate,#DATEVAR) <= 24
GROUP BY YEAR(#TEMP.SalesDate)*100+MONTH(#TEMP.SalesDate)
,#TEMP.product
Output:
Here is some oracle sql:
With data ( SalesDate,Product,SLSqty)as(
Select to_date('2018-01-24'),'Product A',25 from dual union all
Select to_date('2019-06-10'),'Product B',10 from dual union all
Select to_date('2019-10-07'),'Product C',4 from dual union all
Select to_date('2020-03-05'),'Product A',20 from dual union all
Select to_date('2021-09-01'),'Product A',50 from dual union all
Select to_date('2021-09-01'),'Product B',10 from dual union all
Select to_date('2021-09-02'),'Product C',3 from dual union all
Select to_date('2021-09-04'),'Product A',50 from dual union all
Select to_date('2021-09-07'),'Product B',10 from dual),
theLatest(SalesDate) as(
select max(SalesDate) from data
)
select to_char(d.SalesDate,'YYYY-MM'),d.Product, sum(SLSqty)
from data d
Join theLatest on d.SalesDate >= add_months(theLatest.SalesDate,-24)
group by to_char(d.SalesDate,'YYYY-MM'),d.Product
order by to_char(d.SalesDate,'YYYY-MM')

How to calculate average of values without including the last value (sql)?

I have a table. I partition it by the id and want to calculate average of the values previous to the current, without including the current value. Here is a sample table:
+----+-------+------------+
| id | Value | Date |
+----+-------+------------+
| 1 | 51 | 2020-11-26 |
| 1 | 45 | 2020-11-25 |
| 1 | 47 | 2020-11-24 |
| 2 | 32 | 2020-11-26 |
| 2 | 51 | 2020-11-25 |
| 2 | 45 | 2020-11-24 |
| 3 | 47 | 2020-11-26 |
| 3 | 32 | 2020-11-25 |
| 3 | 35 | 2020-11-24 |
+----+-------+------------+
In this case, it means calculating the average of values for dates BEFORE 2020-11-26. This is the expected result
+----+-------+
| id | Value |
+----+-------+
| 1 | 46 |
| 2 | 48 |
| 3 | 33.5 |
+----+-------+
I have calculated it using ROWS N PRECEDING but it appears that this way I average N preceding + last row, and I want to exclude the last row (which is the most recent date in my case).
Here is my query:
SELECT ID,
(avg(Value) OVER(
PARTITION BY ID
ORDER BY Date
ROWS 9 PRECEDING )) as avg9
FROM t1
Then define your window in full using both the start and ends with BETWEEN:
SELECT ID,
(AVG(Value) OVER (PARTITION BY ID ORDER BY Date ROWS BETWEEN 9 PRECEDING AND 1 PRECEDING)) AS avg9
FROM t1;
Why not just filter:
select id, avg(value)
from t1
where date < '2020-11-26'
group by id;
If you want the date to be flexible -- say the most recent value for each date, then:
select id, avg(value)
from (select t1.*,
max(date) over (partition by id) as max_date
from t1
) t1
where date < max_date
group by id;
Do a row_number() over (Partition by id ORDER BY [Date] DESC). This will give a rank = 1 to the row with latest date. Wrap it within a CTE and then calculate avg for each partition where RANK > 1. Please check syntax.
;with a as
(
select id, value, Date, row_number() over (partition by id order by date
desc) as RN
)
select id, avg(Value) from a group by id where r.RN > 1

Postgresql how to select columns where it matches conditions?

I have a table like this:
inventory_id | customer_id | max
--------------+-------------+---------------------
4497 | 1 | 2005-07-28 00:00:00
1449 | 1 | 2005-08-22 00:00:00
1440 | 1 | 2005-08-02 00:00:00
3232 | 1 | 2005-08-02 00:00:00
3418 | 2 | 2005-08-02 00:00:00
654 | 2 | 2005-08-02 00:00:00
3164 | 2 | 2005-08-21 00:00:00
2053 | 2 | 2005-07-27 00:00:00
I want to select rows where most recent date with corresponding columns, This is what I want to achieve:
inventory_id | customer_id | max
--------------+-------------+---------------------
1449 | 1 | 2005-08-22 00:00:00
3164 | 2 | 2005-08-21 00:00:00
I tried to use aggregate but I need inventory_id and customer_id appear at the same time.
Is there any method that could do this?
Use distinct on:
select distinct on (customer_id) t.*
from t
order by customer_id, max desc;
distinct on is a Postgres extension that returns on row per whatever is in the parentheses. This row is based on the order by -- the first one that appears in the sorted set.
SELECT inventory_id, customer_id, max FROM
(SELECT inventory_id, customer_id, max,
ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY max DESC) AS ROWNO
FROM inventory_table) AS A
WHERE ROWNO=1

How do I update a value based on the dense_rank result?

I have a query (SQL Server 2017) that finds two different discounts on the same date.
WITH CTE AS (SELECT [date_id], [good_id], [store_id], [name_promo_mech], [discount],
RN = DENSE_RANK() OVER (PARTITION BY [date_id], [good_id], [store_id], [name_promo_mech]
ORDER BY [discount]) +
DENSE_RANK() OVER (PARTITION BY [date_id], [good_id], [store_id], [name_promo_mech]
ORDER BY [discount] DESC) - 1
FROM [dbo].[ds_promo_list_by_day_new] AS PL
)
SELECT * FROM CTE
WHERE RN > 1;
GO
Query result:
+------------+----------+---------+-----------------+----------+----+
| date_id | store_id | good_id | name_promo_mech | discount | RN |
+------------+----------+---------+-----------------+----------+----+
| 2017-01-01 | 3 | 98398 | January 2017 | 15 | 2 |
+------------+----------+---------+-----------------+----------+----+
| 2017-01-01 | 3 | 98398 | January 2017 | 40 | 2 |
+------------+----------+---------+-----------------+----------+----+
| 2017-01-01 | 5 | 98398 | January 2017 | 15 | 3 |
+------------+----------+---------+-----------------+----------+----+
| 2017-01-01 | 5 | 98398 | January 2017 | 40 | 3 |
+------------+----------+---------+-----------------+----------+----+
| 2017-01-01 | 5 | 98398 | January 2017 | 30 | 3 |
+------------+----------+---------+-----------------+----------+----+
Now I want to make the discounts the same for all unique good_id, store_id, name_promo_merch in the source table. There is a rule for this. For example, for the row good_id = 98398, store_id = 3, name_promo_mech = N'january 2017' there were 10 entries with a 15 discount, and 20 with a 40 discount, then the 15 discount should be replaced with 40. However, if the number of entries for each discount was the same, then the maximum discount is set for all of them.
Can I do this? The number of rows in the source table is about 100 million.
What you want to do is set the value to the mode (a statistical term for the most common value) on each date and combination of whatever. You can use window functions:
with toupdate as (
select pl.*,
first_value(discount) over (partition by date_id, good_id, store_id, name_promo_mech order by cnt desc, discount desc) as mode_discount
from (select pl.*,
count(*) over (partition by date_id, good_id, store_id, name_promo_mech, discount) as cnt
from ds_promo_list_by_day_new pl
) pl
)
update toupdate
set discount = mode_discount
where mode_discount <> discount;
The subquery counts the number of values for each discount for each whatever on each day. The outer query gets the discount with the largest count, and in the case of ties, the larger value.
The rest is a simple update.