SQL query needed - Counting 365 days backwards - sql

I have searched the forum many times but couldn't find a solution for my situation. I am working with an Oracle database.
I have a table with all Order Numbers and Customer Numbers by Day. It looks like this:
Day | Customer Nbr | Order Nbr
2018-01-05 | 25687459 | 256
2018-01-09 | 36478592 | 398
2018-03-07 | 25687459 | 1547
and so on....
Now I need a SQL Query which gives me a table by day and Customer Nbr and counts the number of unique Order Numbers within the last 365 days starting from column 1.
For the example above the resulting table should look like:
Day | Customer Nbr | Order Cnt
2019-01-01 | 25687459 | 2
2019-01-02 | 25687459 | 2
...
2019-03-01 | 25687459 | 1

One method is to generate values for all days of interest for each customer and then use a correlated subquery:
with dates as (
select date '2019-01-01' + rownum as dte from dual
connect by date '2019-01-01' + rownum < sysdate
)
select d.dte, t.customer_nbr,
(select count(*)
from t t2
where t2.customer_nbr = t.customer_nbr and
t2.day <= t.dte and
t2.date > t.dte - 365
) as order_cnt
from dates d cross join
(select distinct customer_nbr from t) ;

Edit:
I've just seen you clarify the question, which I've interpreted to mean:
For every day in the last year, show how many orders there were for each customer between that date, and 1 year previously. Working on an answer now...
Updated Answer:
For each customer, we count the number of records between the order day, and 365 days before it...
WITH yourTable AS
(
SELECT SYSDATE - 1 Day, 'Alex' CustomerNbr FROM DUAL
UNION ALL
SELECT SYSDATE - 2, 'Alex' FROM DUAL
UNION ALL
SELECT SYSDATE - 366, 'Alex'FROM DUAL
UNION ALL
SELECT SYSDATE - 400, 'Alex'FROM DUAL
UNION ALL
SELECT SYSDATE - 500, 'Alex'FROM DUAL
UNION ALL
SELECT SYSDATE - 1, 'Joe'FROM DUAL
UNION ALL
SELECT SYSDATE - 300, 'Chris'FROM DUAL
UNION ALL
SELECT SYSDATE - 1, 'Chris'FROM DUAL
)
SELECT Day, CustomerNbr, OrdersLast365Days
FROM yourTable t
OUTER APPLY
(
SELECT COUNT(1) OrdersLast365Days
FROM yourTable t2
WHERE t.CustomerNbr = t2.CustomerNbr
AND TRUNC(t2.Day) >= TRUNC(t.Day) - 364
AND TRUNC(t2.Day) <= TRUNC(t.Day)
)
ORDER BY t.Day DESC, t.CustomerNbr;
If you want to report on just the days you have orders for, then a simple WHERE clause should be enough:
SELECT Day, CustomerNbr, COUNT(1) OrderCount
FROM <yourTable>
WHERE TRUNC(DAY) >= TRUNC(SYSDATE -364)
GROUP BY Day, CustomerNbr
ORDER BY Day Desc;
If you want to report on every day, you'll need to generate them first. This can be done by a recursive CTE, which you then join to your table:
WITH last365Days AS
(
SELECT TRUNC (SYSDATE - ROWNUM + 1) dt
FROM DUAL CONNECT BY ROWNUM < 365
)
SELECT d.Day, COALESCE(t.CustomerNbr, 'None') CustomerNbr, SUM(CASE WHEN t.CustomerNbr IS NULL THEN 0 ELSE 1 END) OrderCount
FROM last365Days d
LEFT OUTER JOIN <yourTable> t
ON d.Day = TRUNC(t.Day)
GROUP BY d.Day, t.CustomerNbr
ORDER BY d.Day Desc;

I would probably have done it with and analytic function. In your windowing clause, you can specify a number of rows before, or a range. In this case I will use a range.
This will give you, For Each customer for each day the number of orders during one rolling year before the date displayed
WITH DATES AS (
SELECT * FROM
(SELECT TRUNC(SYSDATE)-(LEVEL-1) AS DAY FROM DUAL CONNECT BY TRUNC(SYSDATE)-(LEVEL-1) >= ( SELECT MIN(TRUNC(DAY)) FROM MY_TABLE ))
CROSS JOIN
(SELECT DISTINCT CUST_ID FROM MY_TABLE))
SELECT DISTINCT
DATES.DAY,
DATES.CUST_ID,
COUNT(ORDER_ID) OVER (PARTITION BY DATES.CUST_ID ORDER BY DATES.DAY RANGE BETWEEN INTERVAL '1' YEAR PRECEDING AND INTERVAL '1' SECOND PRECEDING)
FROM
DATES
LEFT JOIN
MY_TABLE
ON DATES.DAY=TRUNC(MY_TABLE.DAY) AND DATES.CUST_ID=MY_TABLE.CUST_ID
ORDER BY DATES.CUST_ID,DATES.DAY;

Related

SQL: How to create a daily view based on different time intervals using SQL logic?

Here is an example:
Id|price|Date
1|2|2022-05-21
1|3|2022-06-15
1|2.5|2022-06-19
Needs to look like this:
Id|Date|price
1|2022-05-21|2
1|2022-05-22|2
1|2022-05-23|2
...
1|2022-06-15|3
1|2022-06-16|3
1|2022-06-17|3
1|2022-06-18|3
1|2022-06-19|2.5
1|2022-06-20|2.5
...
Until today
1|2022-08-30|2.5
I tried using the lag(price) over (partition by id order by date)
But i can't get it right.
I'm not familiar with Azure, but it looks like you need to use a calendar table, or generate missing dates using a recursive CTE.
To get started with a recursive CTE, you can generate line numbers for each id (assuming multiple id values) in the source data ordered by date. These rows with row number equal to 1 (with the minimum date value for the corresponding id) will be used as the starting point for the recursion. Then you can use the DATEADD function to generate the row for the next day. To use the price values ​​from the original data, you can use a subquery to get the price for this new date, and if there is no such value (no row for this date), use the previous price value from CTE (use the COALESCE function for this).
For SQL Server query can look like this
WITH cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATEADD(d, 1, cte.date),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATEADD(d, 1, cte.date)),
cte.price
)
FROM cte
WHERE DATEADD(d, 1, cte.date) <= GETDATE()
)
SELECT * FROM cte
ORDER BY id, date
OPTION (MAXRECURSION 0)
Note that I added OPTION (MAXRECURSION 0) to make the recursion run through all the steps, since the default value is 100, this is not enough to complete the recursion.
db<>fiddle here
The same approach for MySQL (you need MySQL of version 8.0 to use CTE)
WITH RECURSIVE cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATE_ADD(cte.date, interval 1 day),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATE_ADD(cte.date, interval 1 day)),
cte.price
)
FROM cte
WHERE DATE_ADD(cte.date, interval 1 day) <= NOW()
)
SELECT * FROM cte
ORDER BY id, date
db<>fiddle here
Both queries produces the same results, the only difference is the use of the engine's specific date functions.
For MySQL versions below 8.0, you can use a calendar table since you don't have CTE support and can't generate the required date range.
Assuming there is a column in the calendar table to store date values ​​(let's call it date for simplicity) you can use the CROSS JOIN operator to generate date ranges for the id values in your table that will match existing dates. Then you can use a subquery to get the latest price value from the table which is stored for the corresponding date or before it.
So the query would be like this
SELECT
d.id,
d.date,
(SELECT
price
FROM tbl
WHERE tbl.id = d.id AND tbl.date <= d.date
ORDER BY tbl.date DESC
LIMIT 1
) price
FROM (
SELECT
t.id,
c.date
FROM calendar c
CROSS JOIN (SELECT DISTINCT id FROM tbl) t
WHERE c.date BETWEEN (
SELECT
MIN(date) min_date
FROM tbl
WHERE tbl.id = t.id
)
AND NOW()
) d
ORDER BY id, date
Using my pseudo-calendar table with date values ranging from 2022-05-20 to 2022-05-30 and source data in that range, like so
id
price
date
1
2
2022-05-21
1
3
2022-05-25
1
2.5
2022-05-28
2
10
2022-05-25
2
100
2022-05-30
the query produces following results
id
date
price
1
2022-05-21
2
1
2022-05-22
2
1
2022-05-23
2
1
2022-05-24
2
1
2022-05-25
3
1
2022-05-26
3
1
2022-05-27
3
1
2022-05-28
2.5
1
2022-05-29
2.5
1
2022-05-30
2.5
2
2022-05-25
10
2
2022-05-26
10
2
2022-05-27
10
2
2022-05-28
10
2
2022-05-29
10
2
2022-05-30
100
db<>fiddle here

sql get balance at end of year

I have a transactions table for a single year with the amount indicating the debit transaction if the value is negative or credit transaction values are positive.
Now in a given month if the number of debit records is less than 3 or if the sum of debits for a month is less than 100 then I want to charge a fee of 5.
I want to build and sql query for this in postgre:
select sum(amount), count(1), date_part('month', date) as month from transactions where amount < 0 group by month;
I am able get records per month level, I am stuck on how to proceed further and get the result.
You can start by generating the series of month with generate_series(). Then join that with an aggregate query on transactions, and finally implement the business logic in the outer query:
select sum(t.balance)
- 5 * count(*) filter(where coalesce(t.cnt, 0) < 3 or coalesce(t.debit, 0) < 100) as balance
from generate_series(date '2020-01-01', date '2020-12-01', '1 month') as d(dt)
left join (
select date_trunc('month', date) as dt, count(*) cnt, sum(amount) as balance,
sum(-amount) filter(where amount < 0) as debit
from transactions t
group by date_trunc('month', date)
) t on t.dt = d.dt
Demo on DB Fiddle:
| balance |
| ------: |
| 2746 |
How about this approach?
SELECT
SUM(
CASE
WHEN usage.amount_s > 100
OR usage.event_c > 3
THEN 0
ELSE 5
END
) AS YEAR_FEE
FROM (SELECT 1 AS month UNION
SELECT 2 UNION
SELECT 3 UNION
SELECT 4 UNION
SELECT 5 UNION
SELECT 6 UNION
SELECT 7 UNION
SELECT 8 UNION
SELECT 9 UNION
SELECT 10 UNION
SELECT 11 UNION
SELECT 12
) months
LEFT OUTER JOIN
(
SELECT
sum(amount) AS amount_s,
count(1) event_c,
date_part('month', date) AS month
FROM transactions
WHERE amount < 0
GROUP BY month
) usage ON months.month = usage.month;
First you must use a resultset that returns all the months (1-12) and join it with a LEFT join to your table.
Then aggregate to get the the sum of each month's amount and with conditional aggregation subtract 5 from the months that meet your conditions.
Finally use SUM() window function to sum the result of each month:
SELECT DISTINCT SUM(
COALESCE(SUM(t.Amount), 0) -
CASE
WHEN SUM((t.Amount < 0)::int) < 3
OR SUM(CASE WHEN t.Amount < 0 THEN -t.Amount ELSE 0 END) < 100 THEN 5
ELSE 0
END
) OVER () total
FROM generate_series(1, 12, 1) m(month) LEFT JOIN transactions t
ON m.month = date_part('month', t.date) AND date_part('year', t.date) = 2020
GROUP BY m.month
See the demo.
Results:
> | total |
> | ----: |
> | 2746 |
I think you can use the hanving clause.
Select ( sum(a.total) - (12- count(b.cnt ))*5 ) as result From
(Select sum(amount) as total , 'A' as name from transactions ) as a left join
(Select count(amount) as cnt , 'A' as name
From transactions
where amount <0
group by month(date)
having not(count(amount) <3 or sum(amount) >-100) ) as b
on a.name = b.name
select
sum(amount) - 5*(12-(
select count(*)
from(select month, count(amount),sum(amount)
from transactions
where amount<0
group by month
having Count(amount)>=3 And Sum(amount)<=-100))) as balance
from transactions ;

Calculate standdard deviation over time

I have information about sales per day. For example:
Date - Product - Amount
01-07-2020 - A - 10
01-03-2020 - A - 20
01-02-2020 - B - 10
Now I would like to know the average sales per day and the standard deviation for the last year. For average I can just count the number of entries per item, and then count 365-amount of entries and take that many 0's, but I wonder what the best way is to calculate the standard deviation while incorporating the 0's for the days there are not sales.
Use a hierarchical (or recursive) query to generate daily dates for the year and then use a PARTITION OUTER JOIN to join it to your product data then you can find the average and standard deviation with the AVG and STDDEV aggregation functions and use COALESCE to fill in NULL values with zeroes:
WITH start_date ( dt ) AS (
SELECT DATE '2020-01-01' FROM DUAL
),
calendar ( dt ) AS (
SELECT dt + LEVEL - 1
FROM start_date
CONNECT BY dt + LEVEL - 1 < ADD_MONTHS( dt, 12 )
)
SELECT product,
AVG( COALESCE( amount, 0 ) ) AS average_sales_per_day,
STDDEV( COALESCE( amount, 0 ) ) AS stddev_sales_per_day
FROM calendar c
LEFT OUTER JOIN (
SELECT t.*
FROM test_data t
INNER JOIN start_date s
ON (
s.dt <= t."DATE"
AND t."DATE" < ADD_MONTHS( s.dt, 12 )
)
) t
PARTITION BY ( t.product )
ON ( c.dt = t."DATE" )
GROUP BY product
So, for your sample data:
CREATE TABLE test_data ( "DATE", Product, Amount ) AS
SELECT DATE '2020-07-01', 'A', 10 FROM DUAL UNION ALL
SELECT DATE '2020-03-01', 'A', 20 FROM DUAL UNION ALL
SELECT DATE '2020-02-01', 'B', 10 FROM DUAL;
This outputs:
PRODUCT | AVERAGE_SALES_PER_DAY | STDDEV_SALES_PER_DAY
:------ | ----------------------------------------: | ----------------------------------------:
A | .0819672131147540983606557377049180327869 | 1.16752986363678031669548047505759328696
B | .027322404371584699453551912568306010929 | .5227083734893166933219264686616717636897
db<>fiddle here

How to select rows where numbers have increased from previous record in SQL

I have a table with 2 columns date and sales, from this I need to pick up the dates on which sales have increased from previous date. Below is a sample table
Date Sales
-------------------
1/8/2020 10
1/9/2020 12
1/10/2020 8
1/11/2020 7
1/12/2020 13
Output should be as below:
Date
---------
1/9/2020
1/12/2020
Query:
Select data
from table
where sales > sales of previous day
You can use LAG to calculate this:
with cte
as (select date_c
, sales
, lag(sales) over (order by date_c) sales2
from Test)
select date_c, sales from cte
where sales > sales2;
Here is a DEMO
If you have gaps in days, you can consider this following logic with sub query-
DEMO HERE
WITH CTE AS
(
SELECT Date,Sales,
(
SELECT Sales
FROM your_table
WHERE Date = (SELECT MAX(Date) FROM your_table WHERE Date < A.Date)
) Last_day_sales
FROM your_table A
)
SELECT Date,Sales
FROM CTE
WHERE Sales > Last_day_sales*emphasized text*
You can use self join for this requirement.
select
A.date
from TableA as A
inner join TableA as B on B.date = (A.date - interval '1 day')
and A.sales > B.sales;

Add Missing monthly dates in a timeseries data in Postgresql

I have monthly time series data in table where dates are as a last day of month. Some of the dates are missing in the data. I want to insert those dates and put zero value for other attributes.
Table is as follows:
id report_date price
1 2015-01-31 40
1 2015-02-28 56
1 2015-04-30 34
2 2014-05-31 45
2 2014-08-31 47
I want to convert this table to
id report_date price
1 2015-01-31 40
1 2015-02-28 56
1 2015-03-31 0
1 2015-04-30 34
2 2014-05-31 45
2 2014-06-30 0
2 2014-07-31 0
2 2014-08-31 47
Is there any way we can do this in Postgresql?
Currently we are doing this in Python. As our data is growing day by day and its not efficient to handle I/O just for one task.
Thank you
You can do this using generate_series() to generate the dates and then left join to bring in the values:
with m as (
select id, min(report_date) as minrd, max(report_date) as maxrd
from t
group by id
)
select m.id, m.report_date, coalesce(t.price, 0) as price
from (select m.*, generate_series(minrd, maxrd, interval '1' month) as report_date
from m
) m left join
t
on m.report_date = t.report_date;
EDIT:
Turns out that the above doesn't quite work, because adding months to the end of month doesn't keep the last day of the month.
This is easily fixed:
with t as (
select 1 as id, date '2012-01-31' as report_date, 10 as price union all
select 1 as id, date '2012-04-30', 20
), m as (
select id, min(report_date) - interval '1 day' as minrd, max(report_date) - interval '1 day' as maxrd
from t
group by id
)
select m.id, m.report_date, coalesce(t.price, 0) as price
from (select m.*, generate_series(minrd, maxrd, interval '1' month) + interval '1 day' as report_date
from m
) m left join
t
on m.report_date = t.report_date;
The first CTE is just to generate sample data.
This is a slight improvement over Gordon's query which fails to get the last date of a month in some cases.
Essentially you generate all the month end dates between the min and max date for each id (using generate_series) and left join on this generated table to show the missing dates with 0 price.
with minmax as (
select id, min(report_date) as mindt, max(report_date) as maxdt
from t
group by id
)
select m.id, m.report_date, coalesce(t.price, 0) as price
from (select *,
generate_series(date_trunc('MONTH',mindt+interval '1' day),
date_trunc('MONTH',maxdt+interval '1' day),
interval '1' month) - interval '1 day' as report_date
from minmax
) m
left join t on m.report_date = t.report_date
Sample Demo