I am looking to calculate an average (over number of occurrences) and observation count over increasing dates per instance (take customer as an example instance) in Oracle SQL.
So the count will increase as date goes up, the average could go up or down.
I can do it for an individual case and a fixed time interval, but I would like to see a series for every customer, with every row a separate date where a sale occurred. Right now, I have a single row per customer. Here is the SQL summarizing the average and count for a fixed time interval:
SELECT AVG(bought_usd) as avg_bought
, COUNT(*) as num_of_interactions
, cust_id
FROM salesTable
WHERE obsdate >= DATE('2000-01-01')
AND obsdate <= DATE('2022-01-01')
GROUP BY cust_id
So for an input of:
the output should look like:
Use analytic functions:
SELECT "DATE",
cust,
AVG(bought_usd) OVER (PARTITION BY cust ORDER BY "DATE") AS avg,
COUNT(*) OVER (PARTITION BY cust ORDER BY "DATE") AS cnt
FROM salestable
ORDER BY cust, "DATE"
Note: DATE is a reserved word. You should not use it as an identifier.
Which, for the sample data:
CREATE TABLE salestable ("DATE", cust, bought_usd) AS
SELECT DATE '2010-10-01', 'Cust A', 100 FROM DUAL UNION ALL
SELECT DATE '2010-12-18', 'Cust A', 50 FROM DUAL UNION ALL
SELECT DATE '2010-12-18', 'Cust B', 120 FROM DUAL UNION ALL
SELECT DATE '2011-10-01', 'Cust B', 180 FROM DUAL;
Outputs:
DATE
CUST
AVG
CNT
2010-10-01 00:00:00
Cust A
100
1
2010-12-18 00:00:00
Cust A
75
2
2010-12-18 00:00:00
Cust B
120
1
2011-10-01 00:00:00
Cust B
150
2
db<>fiddle here
Table "client_orders":
date
ordered
id
28.05
50
1
23.06
60
2
24.05
50
1
25.06
130
2
Table "stock":
id
amount
date
1
60
23.04
2
90
25.04
1
10
24.04
2
10
24.06
I want to calculate the amount I need to order (to fulfill the stock) for what date. For instance, it should be:
30 by 28.05 (60+10-50-50=-30) for id = 1
-90 by 25.06 (90-60+10-130=-90) for id = 2
I tried to do it with LAG function, but the problem is that the stock here is not updating.
SELECT *,
SUM(amount - ordered) OVER (PARTITION BY sd.id ORDER BY d.date ASC)
FROM stock sd
LEFT JOIN (SELECT date,
id,
ordered
FROM client_orders) AS d
ON sd.id = d.id
Couldn't find anything similar on the web. Grateful if you share articles/examples how to do that.
You could make a union of the two tables and sum all stock amounts with the negative of ordered amounts. For the date you could instead take the corresponding maximum value.
SELECT id,
SUM(amount),
MAX(date)
FROM (SELECT id,
-ordered AS amount,
date
FROM client_orders
UNION
SELECT *
FROM stock
) stock_and_orders
GROUP BY id
Try it here.
I have the following table
order_id
created_at
customer_id
1
2020-01-02
11
2
2020-02-03
12
3
2020-02-03
11
I would like to add a column "customer_orders_count" that will assign the number of orders that a customer passed to each transaction, ie obtain this table :
order_id
created_at
customer_id
customer_orders_count
1
2020-01-02
11
1
2
2020-02-03
12
1
2
2020-02-03
11
2
My problem it's I can't find how to calculated a local "customer_orders_count" dependind on each order, I only managed to add a column with the global "customer_orders_count" and for example for the first row order_id=1 I'll get customer_orders_count=2 whereas I'll like to be 1.
Does anyone has and idea ?
Use cumulative count:
with mytable as (
select 1 as order_id, date '2020-01-02' as created_at, 11 as customer_id union all
select 2, '2020-02-03', 12 union all
select 3 , '2020-02-03', 11
)
select *, count(*) over (partition by customer_id order by created_at) as customer_orders_count
from mytable
order by order_id
Use row_number():
select t.*,
row_number() over (partition by customer_id order by created_at) as customer_order_count
from t;
This is subtly different from using a cumulative count(). This version guarantees that the numbers for a given customer are never duplicated, even when the dates are the same. A cumulative count has no such guarantee.
I have some data look like this
id date total amount adj amount
1 2017-01-02 100 50
1 2017-01-02 50 0
2 2017-01-15 100 35
2 2017-01-15 35 0
3 2017-01-30 120 50
3 2017-01-30 -120 -50
3 2017-01-30 100 50
3 2017-01-30 50 0
3 2017-01-30 60 40
the output should look like, I have no clue how to do the subtraction between rows and columns.
id date due amount
1 2017-01-02 0
2 2017-01-15 0
3 2017-01-30 40
here is my current code, but it only works on maybe 1 and 2 but definitely not working for 3.
the logic for this part is to find the due amount between each entry for each id. for example, id 1 has two entry, total amount 100, then he paid 50, so the adj amount is 50, and the second entry, the total amount is 50, he paid 50, te adj amount is 0. so id 1 due amount is 0 in the end.
id 3 who has 5 entries, first there is entry show the total amount for ID 3 is 120 and he paid 70, so the adj amount is 50, but the first entry is a mistake, so all amount revised. then the third entry shows the total amount is 100, ID 3 paid 50, so the adj amount is 50. then the fourth entry shows the total amount is 50, ID 3 also paid 50, so the adj amount is 0. and the fifth entry shows that the total amount is 60, and ID 3 paid 20, so the adj amount is 40. so in final, ID 3 due amount is 40;
select distinct a.id,
a.date,
case when a.date=b.date and a.total_amount = b.adj_amount then a.adj_amount
when a.date=b.date and a.total_amount <> b.adj_amount then ABS(a.adj_amount + b.adj_amount)
else a.adj_amount
end as due_amount
from table a,
table b
where a.id=b.id;
I just wonder if there has any function which can do this kind of calculation between rows and columns.
Use GROUP BY and SUM().
SELECT the_date, SUM(due_amount)
FROM tab
GROUP BY the_date;
Something like this could work - if the transactions can be ordered. Note that I've renamed some of the columns to help clarify their meaning. I've also added a trans_seq_num column to indicate the order of a customer's transactions on a particular date. I think you're looking for the amount that the customer still owes as of their last payment.
WITH sample (id, trans_seq_num, some_date, starting_balance, ending_balance) AS
(
SELECT '1',1,'2017-01-02','100','50' FROM dual UNION ALL
SELECT '1',2,'2017-01-02','50','0' FROM dual UNION ALL
SELECT '2',1,'2017-01-15','35','0' FROM dual UNION ALL
SELECT '2',2,'2017-01-15','100','35' FROM dual UNION ALL
SELECT '3',1,'2017-01-30','120','50' FROM dual UNION ALL
SELECT '3',2,'2017-01-30','-120','-50' FROM dual UNION ALL
SELECT '3',3,'2017-01-30','100','50' FROM dual UNION ALL
SELECT '3',4,'2017-01-30','50','0' FROM dual UNION ALL
SELECT '3',5,'2017-01-30','60','40' FROM dual
)
SELECT DISTINCT id,
some_date,
LAST_VALUE(ending_balance) OVER (PARTITION BY id ORDER BY trans_seq_num RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) day_balance
FROM sample
ORDER BY 1,2,3;
ID SOME_DATE AMOUNT_DUE
----- --------------- ---------------
1 2017-01-02 0
2 2017-01-15 35
3 2017-01-30 40
The others already said: you should have any way of numbering rows. Simple sequence will do the job. With such unique column solution is trivial, we only find last row for each id.
But you have no order. Here is my try which looks OK so far and may temporary help:
with q as (
select table_a.*,
row_number() over (partition by id, date_, total_amount, adj_amount
order by null) rn
from table_a),
t as (
select a.*,
row_number() over (partition by id, date_, total_amount
order by null) r1,
row_number() over (partition by id, date_, adj_amount
order by null) r2
from q a
where not exists (
select 1 from q b
where a.id = b.id and a.date_ = b.date_ and a.rn = b.rn
and a.total_amount = -b.total_amount and a.adj_amount = -b.adj_amount))
select id, date_, max(adj_amount) due
from t
where connect_by_isleaf = 1
connect by prior id = id and prior date_ = date_
and prior adj_amount = total_amount and prior r2 = r1
group by id, date_;
dbfiddle
First I eliminate mistakes. Subquery t does this, it is simple not exists with added row_number to handle properly multiple cases ( like (120, 50) => (-120, -50) and again (120, 50) ).
Data is cleared so we can recursively find connected rows by previous adj_amount = total_amount. We have to use row_numbers again to handle identical rows (60, 40) => (40, 0) => (60, 40) again.
Then only leaves are taken and finally max value of these leaves which should contain orphaned non zero value if such exists for each id. You can add connect_by_path() clause to see if connection works properly.
Hierarchical queries are slower than others, so if your table is big, be warned. Filter data at first, if needed.
This query works for your examples and some others which I imagined and tested. But even if it works you should add ordering column (if possible) and have guaranteed, simple way to obtain correct results.
I have a BiqQuery query that basically takes a date as a parameter and calculates the number of active users our app had near that date.
Right now, if I want to make a graph over a year of active users, I have to run the query 12 times (once per month) and collate the results manually, which is error-prone and time consuming.
Is there a way to make a single bigquery query that runs the subquery 12 times and puts the results on 12 different rows?
For example, if my query is
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-01-01'
How can I get a table like
| Date | Count |
|------------|---------|
| 2017-01-01 | 50000 |
| 2017-02-01 | 40000 |
| 2017-03-01 | 30000 |
| 2017-04-01 | 20000 |
| 2017-05-01 | 10000 |
Supposing that you have a column called date and one called user_id and you want to calculate distinct users on a monthly basis, you can run a query such as:
#standardSQL
SELECT
DATE_TRUNC(date, MONTH) AS month,
COUNT(DISTINCT user_id) AS distinct_users
FROM YourTable
GROUP BY month
ORDER BY month ASC;
(Here you can replace YourTable with the subquery that you want to run). As a self-contained example:
#standardSQL
WITH YourTable AS (
SELECT DATE '2017-06-25' AS date, 10 AS user_id UNION ALL
SELECT DATE '2017-05-04', 11 UNION ALL
SELECT DATE '2017-06-20', 10 UNION ALL
SELECT DATE '2017-04-01', 11 UNION ALL
SELECT DATE '2017-06-02', 12 UNION ALL
SELECT DATE '2017-04-13', 10
)
SELECT
DATE_TRUNC(date, MONTH) AS month,
COUNT(DISTINCT user_id) AS distinct_users
FROM YourTable
GROUP BY month
ORDER BY month ASC;
Elliot taught me UNION ALL and it seemed to do the trick:
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-01-01'
UNION ALL
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-02-01'
UNION ALL
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-03-01'
Maybe there's a nicer way to parameterize the dates in the WHERE clause, but this did the trick for me.