Writing subquery within SUM using values of 1 table - sql

Now I have a table and I am trying to calculate for each book_id the total sales in the past 100 days for every day in the past 1 year.
book_id location seller daily_sales order_day
ABC 1 XYZ 100 2017-05-05
ABC 1 XYZ 120 2017-05-07
ABC 1 XYZ 40 2017-02-10
.
.
.
So what I am trying to expect in the result is:
book_id order_day sum
ABC 2017-05-05 100+40
ABC 2017-05-07 100+120+40
ABC 2017-02-10 40
For this I wrote a query like this:
select book_id, to_char(order_day),
SUM(case when order_day between order_day -100 and order_day then daily_sales else 0 end) sum
FROM bookDetailsTable
where location = 1 AND ORDER_DAY BETWEEN TO_DATE('20170725','YYYYMMDD') - 359 AND TO_DATE('20170725','YYYYMMDD')
group by seller, book_id, order_day
I guess I am doing wrong and I should write a select statement within the SUM statement to select data for the past 100 days.

You should get the result with this
select A.book_id,
A.order_day,
( select sum(b.daily_sales)
from bookDetailsTable b
where A.book_id = B.book_id
and B.order_day between A.order_day -100 and A.order_day
)
from bookDetailsTable A
where A.order_day between ADD_MONTHS(trunc(sysdate),-12) and trunc(sysdate)
If you understand the principle of the query, you should be able to add your other restrictions, like seller or location

This is a perfect case for using analytic functions, specifically the SUM() analytic function, along with the windowing clause:
WITH bookdetailstable AS (SELECT 'ABC' book_id, 1 LOCATION, 'XYZ' seller, 100 daily_sales, to_date('05/05/2016', 'dd/mm/yyyy') order_day FROM dual UNION ALL
SELECT 'ABC' book_id, 1 LOCATION, 'XYZ' seller, 120 daily_sales, to_date('07/05/2016', 'dd/mm/yyyy') order_day FROM dual UNION ALL
SELECT 'ABC' book_id, 1 LOCATION, 'XYZ' seller, 40 daily_sales, to_date('10/02/2016', 'dd/mm/yyyy') order_day FROM dual UNION ALL
SELECT 'ABC' book_id, 1 LOCATION, 'XYZ' seller, 600 daily_sales, to_date('10/02/2017', 'dd/mm/yyyy') order_day FROM dual)
SELECT book_id,
to_char(order_day, 'yyyy-mm-dd') order_day,
total_sales_last_100_days
FROM (SELECT book_id,
order_day,
SUM(daily_sales) OVER (PARTITION BY book_id ORDER BY order_day
RANGE BETWEEN 100 PRECEDING AND CURRENT ROW) total_sales_last_100_days
FROM bookdetailstable
where order_day >= add_months(trunc(sysdate) - 100, -12))
where order_day >= add_months(trunc(SYSDATE), -12);
BOOK_ID ORDER_DAY TOTAL_SALES_LAST_100_DAYS
------- ---------- -------------------------
ABC 2016-02-10 40
ABC 2016-05-05 140
ABC 2016-05-07 260
ABC 2017-02-10 600
This simply says get the sum of daily_sales for each book_id (you can think of the partition by clause as being similar to the group by clause - it simply defines the group of rows the function applies over) ordered by the order_day, looking at the 100 preceding rows and the current row.
If you needed to work out the cumulative sum for specific book_ids based on location (and seller and ....), then you would need to include the extra grouping columns in the partition by clause.
Since you want to restrict the results to the past year, assuming you want the first row to return the count for the past 100 days as well, rather than starting with the current day, you need to include 100 days prior to a year ago. Then you restrict the rows to the year's worth of data you're interested in.
That's because analytic functions work across the data after it's been filtered by the where clause, so if you want to include data from outside the current where clause, you're going to have to look for a way to include those rows and then do the additional filtering later.

Related

Cumulative average and count over occurrences increasing in time

I am looking to calculate an average (over number of occurrences) and observation count over increasing dates per instance (take customer as an example instance) in Oracle SQL.
So the count will increase as date goes up, the average could go up or down.
I can do it for an individual case and a fixed time interval, but I would like to see a series for every customer, with every row a separate date where a sale occurred. Right now, I have a single row per customer. Here is the SQL summarizing the average and count for a fixed time interval:
SELECT AVG(bought_usd) as avg_bought
, COUNT(*) as num_of_interactions
, cust_id
FROM salesTable
WHERE obsdate >= DATE('2000-01-01')
AND obsdate <= DATE('2022-01-01')
GROUP BY cust_id
So for an input of:
the output should look like:
Use analytic functions:
SELECT "DATE",
cust,
AVG(bought_usd) OVER (PARTITION BY cust ORDER BY "DATE") AS avg,
COUNT(*) OVER (PARTITION BY cust ORDER BY "DATE") AS cnt
FROM salestable
ORDER BY cust, "DATE"
Note: DATE is a reserved word. You should not use it as an identifier.
Which, for the sample data:
CREATE TABLE salestable ("DATE", cust, bought_usd) AS
SELECT DATE '2010-10-01', 'Cust A', 100 FROM DUAL UNION ALL
SELECT DATE '2010-12-18', 'Cust A', 50 FROM DUAL UNION ALL
SELECT DATE '2010-12-18', 'Cust B', 120 FROM DUAL UNION ALL
SELECT DATE '2011-10-01', 'Cust B', 180 FROM DUAL;
Outputs:
DATE
CUST
AVG
CNT
2010-10-01 00:00:00
Cust A
100
1
2010-12-18 00:00:00
Cust A
75
2
2010-12-18 00:00:00
Cust B
120
1
2011-10-01 00:00:00
Cust B
150
2
db<>fiddle here

How to calculate needed amount for supply order?

Table "client_orders":
date
ordered
id
28.05
50
1
23.06
60
2
24.05
50
1
25.06
130
2
Table "stock":
id
amount
date
1
60
23.04
2
90
25.04
1
10
24.04
2
10
24.06
I want to calculate the amount I need to order (to fulfill the stock) for what date. For instance, it should be:
30 by 28.05 (60+10-50-50=-30) for id = 1
-90 by 25.06 (90-60+10-130=-90) for id = 2
I tried to do it with LAG function, but the problem is that the stock here is not updating.
SELECT *,
SUM(amount - ordered) OVER (PARTITION BY sd.id ORDER BY d.date ASC)
FROM stock sd
LEFT JOIN (SELECT date,
id,
ordered
FROM client_orders) AS d
ON sd.id = d.id
Couldn't find anything similar on the web. Grateful if you share articles/examples how to do that.
You could make a union of the two tables and sum all stock amounts with the negative of ordered amounts. For the date you could instead take the corresponding maximum value.
SELECT id,
SUM(amount),
MAX(date)
FROM (SELECT id,
-ordered AS amount,
date
FROM client_orders
UNION
SELECT *
FROM stock
) stock_and_orders
GROUP BY id
Try it here.

Add a column with customers orders count at the time they passed the order

I have the following table
order_id
created_at
customer_id
1
2020-01-02
11
2
2020-02-03
12
3
2020-02-03
11
I would like to add a column "customer_orders_count" that will assign the number of orders that a customer passed to each transaction, ie obtain this table :
order_id
created_at
customer_id
customer_orders_count
1
2020-01-02
11
1
2
2020-02-03
12
1
2
2020-02-03
11
2
My problem it's I can't find how to calculated a local "customer_orders_count" dependind on each order, I only managed to add a column with the global "customer_orders_count" and for example for the first row order_id=1 I'll get customer_orders_count=2 whereas I'll like to be 1.
Does anyone has and idea ?
Use cumulative count:
with mytable as (
select 1 as order_id, date '2020-01-02' as created_at, 11 as customer_id union all
select 2, '2020-02-03', 12 union all
select 3 , '2020-02-03', 11
)
select *, count(*) over (partition by customer_id order by created_at) as customer_orders_count
from mytable
order by order_id
Use row_number():
select t.*,
row_number() over (partition by customer_id order by created_at) as customer_order_count
from t;
This is subtly different from using a cumulative count(). This version guarantees that the numbers for a given customer are never duplicated, even when the dates are the same. A cumulative count has no such guarantee.

subtract and add between columns and rows

I have some data look like this
id date total amount adj amount
1 2017-01-02 100 50
1 2017-01-02 50 0
2 2017-01-15 100 35
2 2017-01-15 35 0
3 2017-01-30 120 50
3 2017-01-30 -120 -50
3 2017-01-30 100 50
3 2017-01-30 50 0
3 2017-01-30 60 40
the output should look like, I have no clue how to do the subtraction between rows and columns.
id date due amount
1 2017-01-02 0
2 2017-01-15 0
3 2017-01-30 40
here is my current code, but it only works on maybe 1 and 2 but definitely not working for 3.
the logic for this part is to find the due amount between each entry for each id. for example, id 1 has two entry, total amount 100, then he paid 50, so the adj amount is 50, and the second entry, the total amount is 50, he paid 50, te adj amount is 0. so id 1 due amount is 0 in the end.
id 3 who has 5 entries, first there is entry show the total amount for ID 3 is 120 and he paid 70, so the adj amount is 50, but the first entry is a mistake, so all amount revised. then the third entry shows the total amount is 100, ID 3 paid 50, so the adj amount is 50. then the fourth entry shows the total amount is 50, ID 3 also paid 50, so the adj amount is 0. and the fifth entry shows that the total amount is 60, and ID 3 paid 20, so the adj amount is 40. so in final, ID 3 due amount is 40;
select distinct a.id,
a.date,
case when a.date=b.date and a.total_amount = b.adj_amount then a.adj_amount
when a.date=b.date and a.total_amount <> b.adj_amount then ABS(a.adj_amount + b.adj_amount)
else a.adj_amount
end as due_amount
from table a,
table b
where a.id=b.id;
I just wonder if there has any function which can do this kind of calculation between rows and columns.
Use GROUP BY and SUM().
SELECT the_date, SUM(due_amount)
FROM tab
GROUP BY the_date;
Something like this could work - if the transactions can be ordered. Note that I've renamed some of the columns to help clarify their meaning. I've also added a trans_seq_num column to indicate the order of a customer's transactions on a particular date. I think you're looking for the amount that the customer still owes as of their last payment.
WITH sample (id, trans_seq_num, some_date, starting_balance, ending_balance) AS
(
SELECT '1',1,'2017-01-02','100','50' FROM dual UNION ALL
SELECT '1',2,'2017-01-02','50','0' FROM dual UNION ALL
SELECT '2',1,'2017-01-15','35','0' FROM dual UNION ALL
SELECT '2',2,'2017-01-15','100','35' FROM dual UNION ALL
SELECT '3',1,'2017-01-30','120','50' FROM dual UNION ALL
SELECT '3',2,'2017-01-30','-120','-50' FROM dual UNION ALL
SELECT '3',3,'2017-01-30','100','50' FROM dual UNION ALL
SELECT '3',4,'2017-01-30','50','0' FROM dual UNION ALL
SELECT '3',5,'2017-01-30','60','40' FROM dual
)
SELECT DISTINCT id,
some_date,
LAST_VALUE(ending_balance) OVER (PARTITION BY id ORDER BY trans_seq_num RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) day_balance
FROM sample
ORDER BY 1,2,3;
ID SOME_DATE AMOUNT_DUE
----- --------------- ---------------
1 2017-01-02 0
2 2017-01-15 35
3 2017-01-30 40
The others already said: you should have any way of numbering rows. Simple sequence will do the job. With such unique column solution is trivial, we only find last row for each id.
But you have no order. Here is my try which looks OK so far and may temporary help:
with q as (
select table_a.*,
row_number() over (partition by id, date_, total_amount, adj_amount
order by null) rn
from table_a),
t as (
select a.*,
row_number() over (partition by id, date_, total_amount
order by null) r1,
row_number() over (partition by id, date_, adj_amount
order by null) r2
from q a
where not exists (
select 1 from q b
where a.id = b.id and a.date_ = b.date_ and a.rn = b.rn
and a.total_amount = -b.total_amount and a.adj_amount = -b.adj_amount))
select id, date_, max(adj_amount) due
from t
where connect_by_isleaf = 1
connect by prior id = id and prior date_ = date_
and prior adj_amount = total_amount and prior r2 = r1
group by id, date_;
dbfiddle
First I eliminate mistakes. Subquery t does this, it is simple not exists with added row_number to handle properly multiple cases ( like (120, 50) => (-120, -50) and again (120, 50) ).
Data is cleared so we can recursively find connected rows by previous adj_amount = total_amount. We have to use row_numbers again to handle identical rows (60, 40) => (40, 0) => (60, 40) again.
Then only leaves are taken and finally max value of these leaves which should contain orphaned non zero value if such exists for each id. You can add connect_by_path() clause to see if connection works properly.
Hierarchical queries are slower than others, so if your table is big, be warned. Filter data at first, if needed.
This query works for your examples and some others which I imagined and tested. But even if it works you should add ordering column (if possible) and have guaranteed, simple way to obtain correct results.

Combining multiple scalar bigquery queries into a single query to generate one table

I have a BiqQuery query that basically takes a date as a parameter and calculates the number of active users our app had near that date.
Right now, if I want to make a graph over a year of active users, I have to run the query 12 times (once per month) and collate the results manually, which is error-prone and time consuming.
Is there a way to make a single bigquery query that runs the subquery 12 times and puts the results on 12 different rows?
For example, if my query is
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-01-01'
How can I get a table like
| Date | Count |
|------------|---------|
| 2017-01-01 | 50000 |
| 2017-02-01 | 40000 |
| 2017-03-01 | 30000 |
| 2017-04-01 | 20000 |
| 2017-05-01 | 10000 |
Supposing that you have a column called date and one called user_id and you want to calculate distinct users on a monthly basis, you can run a query such as:
#standardSQL
SELECT
DATE_TRUNC(date, MONTH) AS month,
COUNT(DISTINCT user_id) AS distinct_users
FROM YourTable
GROUP BY month
ORDER BY month ASC;
(Here you can replace YourTable with the subquery that you want to run). As a self-contained example:
#standardSQL
WITH YourTable AS (
SELECT DATE '2017-06-25' AS date, 10 AS user_id UNION ALL
SELECT DATE '2017-05-04', 11 UNION ALL
SELECT DATE '2017-06-20', 10 UNION ALL
SELECT DATE '2017-04-01', 11 UNION ALL
SELECT DATE '2017-06-02', 12 UNION ALL
SELECT DATE '2017-04-13', 10
)
SELECT
DATE_TRUNC(date, MONTH) AS month,
COUNT(DISTINCT user_id) AS distinct_users
FROM YourTable
GROUP BY month
ORDER BY month ASC;
Elliot taught me UNION ALL and it seemed to do the trick:
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-01-01'
UNION ALL
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-02-01'
UNION ALL
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-03-01'
Maybe there's a nicer way to parameterize the dates in the WHERE clause, but this did the trick for me.