Count total customer_id's not partitioned by column - sql

I would like to calculate the total number of customers without adding an additional subquery. The count should be partitioned by country but rather by the month_ column.
EDIT:
I updated the query to use GROUPING SETS
Current query:
select date_trunc('month',date_) as month_,
country,
count(distinct customer_id) as total_customers
GROUP BY GROUPING SETS (
(date_trunc('month',date_), country),
(date_trunc('month',date_))
from table_a
Current output
month_ country total_customers_per_country
2020-01-01 US 320
2020-01-01 GB 360
2020-01-01 680
2020-02-01 US 345
2020-02-01 GB 387
2020-02-01 732
Desired output:
month_ country total_customers_per_country total_customers
2020-01-01 US 320 680
2020-01-01 GB 360 680
2020-02-01 US 345 732
2020-02-01 GB 387 732

This may depend on the version of sql server you are using but you are likely looking for "window" functions.
I believe something along the lines of the following will give you the result you are looking for:
select date_trunc('month',date_) as month_,
country,
count(distinct customer_id) as total_customers_by_country,
count(distinct customer_id) OVER (partition by date_trunc('month',date_)) as total_customers
from table_a
https://learn.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql?view=sql-server-ver15

You can perform subquery to group by month-country pair and then use sum over window partitioned by month:
-- sample data
WITH dataset (id, date, country) AS (
VALUES (1, date '2020-01-01', 'US'),
(2, date '2020-01-01', 'US'),
(1, date '2020-01-01', 'GB'),
(3, date '2020-01-02', 'US'),
(1, date '2020-01-02', 'GB'),
(1, date '2020-02-01', 'US')
)
--query
select *,
sum(total_customers_per_country) over (partition by month) total_customers
from (
select date_trunc('month', date) as month,
country,
count(distinct id) total_customers_per_country
from dataset
group by 1, country
)
order by month, country desc
Output:
month
country
total_customers_per_country
total_customers
2020-01-01
US
3
4
2020-01-01
GB
1
4
2020-02-01
US
1
1

Related

Count of On-Going Transaction in BigQuery

I have this table:
book_name
borrow_date
return_date
A
2022-08-01
2022-08-03
B
2022-08-03
2022-09-01
C
2022-08-15
2022-09-25
D
2022-09-15
2022-09-18
E
2022-09-17
2022-10-15
And table of first date of the month
summary_month
2022-08-01
2022-09-01
2022-10-01
I would like to count how many books are currently borrowed based on the summary_month. The result I am looking for is:
summary_month
count_book
list_book
2022-08-01
3
A,B,C
2022-09-01
4
B,C,D,E
2022-10-01
1
E
I am stuck with only able to aggregate them based on the borrowed date with query:
count(distinct case when summary_month = date_trunc(borrow_date,month) then book_name end) count_book
Is it possible to get the result I am hoping for? Really need anyone's help and advice. Thank you.
Consider below option
select summary_month,
count(distinct book_name) as count_book,
string_agg(book_name) as list_book
from your_table, unnest(generate_date_array(
date_trunc(borrow_date, month),
date_trunc(return_date, month),
interval 1 month)
) as summary_month
group by summary_month
if applied to sample data in your question -output is
Something like this can work:
with
input as (
select 'A' book_name, cast('2022-08-01' as date) borrow_date , cast('2022-08-03' as date) return_date union all
select 'B', '2022-08-03', '2022-09-01' union all
select 'C', '2022-08-15', '2022-09-25' union all
select 'D', '2022-09-15', '2022-09-18' union all
select 'E', '2022-09-17', '2022-10-15'
),
list_month as (
select distinct
* except(days_borrowed),
date_trunc(days_borrowed, month) as month
from input,
unnest(generate_date_array(borrow_date, return_date)) as days_borrowed
)
select
month,
count(distinct book_name) as count_distinct_book,
string_agg(distinct book_name) as book_name_list
from list_month
group by 1
order by 1

Add missing month in result with values from previous month

I have a result set with month as first column. Some of the month are missing in the result. I need to add previous month record as the missing month till last month.
Current data:
Desired Output:
I have a sql but instead of filling for just missing month it is taking every rows into account and populate it.
select
to_char(generate_series(date_trunc('MONTH',to_date(period,'YYYYMMDD')+interval '1' month),
date_trunc('MONTH',now()+interval '1' day),
interval '1' month) - interval '1 day','YYYYMMDD') as period,
name,age,salary,rating
from( values ('20201205','Alex',35,100,'A+'),
('20210110','Alex',35,110,'A'),
('20210512','Alex',35,999,'A+'),
('20210625','Jhon',20,175,'B-'),
('20210922','Jhon',20,200,'B+')) v (period,name,age,salary,rating) order by 2,3,4,5,1;
Output of this query:
Can someone help in getting desired output.
Regards!!
You can achieve this with a recursive cte like this:
with RECURSIVE ctetest as (SELECT * FROM (values ('2020-12-31'::date,'Alex',35,100,'A+'),
('2021-01-31'::date,'Alex',35,110,'A'),
('2021-05-31'::date,'Alex',35,999,'A+'),
('2021-06-30'::date,'Jhon',20,175,'B-'),
('2021-09-30'::date,'Jhon',20,200,'B+')) v (mth, emp, age, salary, rating)),
cte AS (
SELECT MIN(mth) AS mth, emp, age, salary, rating
FROM ctetest
GROUP BY emp, age, salary, rating
UNION
SELECT COALESCE(n.mth, (l.mth + interval '1 day' + interval '1 month' - interval '1 day')::date), COALESCE(n.emp, l.emp),
COALESCE(n.age, l.age), COALESCE(n.salary, l.salary), COALESCE(n.rating, l.rating)
FROM cte l
LEFT OUTER JOIN ctetest n ON n.mth = (l.mth + interval '1 day' + interval '1 month' - interval '1 day')::date
AND n.emp = l.emp
WHERE (l.mth + interval '1 day' + interval '1 month' - interval '1 day')::date <= (SELECT MAX(mth) FROM ctetest)
)
SELECT * FROM cte order by 2, 1;
Note that although ctetest is not itself recursive, being only used to get the test data, if any cte among multiple ctes are recursive, you must have the recursive keyword after the with.
You can use cross join lateral to fill the gaps and then union all with the original data.
WITH the_table (period, name, age, salary, rating) as ( values
('2020-12-01'::date, 'Alex', 35, 100, 'A+'),
('2021-01-01'::date, 'Alex', 35, 110, 'A'),
('2021-05-01'::date, 'Alex', 35, 999, 'A+'),
('2021-06-01'::date, 'Jhon', 20, 100, 'B-'),
('2021-09-01'::date, 'Jhon', 20, 200, 'B+')
),
t as (
select *, coalesce(
lead(period) over (partition by name order by period) - interval 'P1M',
max(period) over ()
) last_period
from the_table
)
SELECT lat::date period, name, age, salary, rating
from t
cross join lateral generate_series
(period + interval 'P1M', last_period, interval 'P1M') lat
UNION ALL
SELECT * from the_table
ORDER BY name, period;
Please note that using integer data type for a date column is sub-optimal. Better review your data design and use date data type instead. You can then present it as integer if necessary.
period
name
age
salary
rating
2020-12-01
Alex
35
100
A+
2021-01-01
Alex
35
110
A
2021-02-01
Alex
35
110
A
2021-03-01
Alex
35
110
A
2021-04-01
Alex
35
110
A
2021-05-01
Alex
35
999
A+
2021-06-01
Alex
35
999
A+
2021-07-01
Alex
35
999
A+
2021-08-01
Alex
35
999
A+
2021-09-01
Alex
35
999
A+
2021-06-01
Jhon
20
100
B-
2021-07-01
Jhon
20
100
B-
2021-08-01
Jhon
20
100
B-
2021-09-01
Jhon
20
200
B+

What SQL query can be used to limit continious periods by parameter value, and then to calculate datediff inside them?

I have a table of phone calls consisting of user_id, call_date, city,
where city can be either A or B.
It looks like this:
user_id
call_date
city
1
2021-01-01
A
1
2021-01-02
B
1
2021-01-03
B
1
2021-01-05
B
1
2021-01-10
A
1
2021-01-12
B
1
2021-01-16
A
2
2021-01-17
A
2
2021-01-20
B
2
2021-01-22
B
2
2021-01-23
A
2
2021-01-24
B
2
2021-01-26
B
2
2021-01-30
A
For this table, we need to select for each user all the periods when he was in city B.
These periods are counted in days and start when the first call is made from city B, and end as soon as the next call is made from city A.
So for user_id = 1 fist period starts on 2021-01-02 and ands on 2021-01-10. There can be several such periods for each user.
The result should be the following table:
user_id
period_1
period_2
1
8
4
2
3
6
Can you please tell me how I can limit the periods according to the condition of the problem, and then calculate the datediff within each period?
Thank you
This is a typical gaps and islands problem. You need to group consecutive rows first, then find the first call_date of the next group. Sample code for Postgres is below, the same may be adapted to another DBMS by applying appropriate function to calculate the difference in days.
with a (user_id, call_date, city)
as (
select *
from ( values
('1', date '2021-01-01', 'A'),
('1', date '2021-01-02', 'B'),
('1', date '2021-01-03', 'B'),
('1', date '2021-01-05', 'B'),
('1', date '2021-01-10', 'A'),
('1', date '2021-01-12', 'B'),
('1', date '2021-01-16', 'A'),
('2', date '2021-01-17', 'A'),
('2', date '2021-01-20', 'B'),
('2', date '2021-01-22', 'B'),
('2', date '2021-01-23', 'A'),
('2', date '2021-01-24', 'B'),
('2', date '2021-01-26', 'B'),
('2', date '2021-01-30', 'A')
) as t
)
, grp as (
/*Identify groups*/
select a.*,
/*This is a grouping of consecutive rows:
they will have the same difference between
two row_numbers while the more detailed
row_number changes, which means the attribute had changed.
*/
dense_rank() over(
partition by user_id
order by call_date asc
) -
dense_rank() over(
partition by user_id, city
order by call_date asc
) as grp,
/*Get next call date*/
lead(call_date, 1, call_date)
over(
partition by user_id
order by call_date asc
) as next_dt
from a
)
select
user_id,
city,
min(call_date) as dt_from,
max(next_dt) as dt_to,
max(next_dt) - min(call_date) as diff
from grp
where city = 'B'
group by user_id, grp, city
order by 1, 3
user_id | city | dt_from | dt_to | diff
:------ | :--- | :--------- | :--------- | ---:
1 | B | 2021-01-02 | 2021-01-10 | 8
1 | B | 2021-01-12 | 2021-01-16 | 4
2 | B | 2021-01-20 | 2021-01-23 | 3
2 | B | 2021-01-24 | 2021-01-30 | 6
db<>fiddle here

Aggregate values if value wasn't seen before in group - SQL / ORACLE

Trying to do this in Oracle queries but SQL works too. I'm wondering if there are any easy functions or ways to do this , in theory I know how to do this in python (see my example below)
Basically I'm trying to run a total distinct count , lets say monthly for a unique identifier lets use "customer_id" but only have them added to the total if they were not seen in prior months.
If customer 1 was seen in Jan and then again in March. They would only be in the Jan total and counted as 1.
The grand total would be the total number of unique_customers
....In python you would do a list , check to see if the customer is in the list if they are it would do nothing. If they are not they get appended to the list and then added to the sum, total. This is just overall total of unique values though and it would have to do this on a monthly total but in theory this is what I would want
l = []
total = 0
customers [12,123,1234,12345,123455]
for i in customers:
if i in l:
pass
else:
l.append(i)
total += 1
return total
Now that I'm typing this out and thinking about it more though I would do a subquery of unique customer and their min(date) of sale. Then when
select count(distinct customer_id), month
from sales
group by month
Doesnt work because each unique customer is counted by month....but if I did
select count(customer_id), month
from
(select customer_id, min(month)
from sales
group by customer_id)
group by month
that would work as it's only using the customers first sale month as the total? Is there an easier way to do this or does this make sense
You appear to want to find the first occurrence of each customer_id; you can use an analytic function for that and then filter on the first occurrence:
SELECT customer_id,
month
FROM (
SELECT customer_id,
month,
ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY month ) AS rn
FROM sales
)
WHERE rn = 1;
Which, for the sample data:
CREATE TABLE sales ( customer_id, month ) AS
SELECT 1, DATE '2021-01-01' FROM DUAL UNION ALL
SELECT 1, DATE '2021-02-01' FROM DUAL UNION ALL
SELECT 1, DATE '2021-03-01' FROM DUAL UNION ALL
SELECT 1, DATE '2021-04-01' FROM DUAL UNION ALL
SELECT 1, DATE '2021-05-01' FROM DUAL UNION ALL
SELECT 2, DATE '2021-01-01' FROM DUAL UNION ALL
SELECT 3, DATE '2021-03-01' FROM DUAL UNION ALL
SELECT 3, DATE '2021-04-01' FROM DUAL UNION ALL
SELECT 3, DATE '2021-05-01' FROM DUAL UNION ALL
SELECT 4, DATE '2021-04-01' FROM DUAL UNION ALL
SELECT 4, DATE '2021-05-01' FROM DUAL;
Outputs:
CUSTOMER_ID | MONTH
----------: | :--------
1 | 01-JAN-21
2 | 01-JAN-21
3 | 01-MAR-21
4 | 01-APR-21
If you want to count, for each month, the users who have not been seen before then just take the previous query and aggregate:
SELECT COUNT(customer_id) AS number_of_new_customers,
month
FROM (
SELECT customer_id,
month,
ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY month ) AS rn
FROM sales
)
WHERE rn = 1
GROUP BY month
ORDER BY month;
Which, for the same sample data, outputs:
NUMBER_OF_NEW_CUSTOMERS | MONTH
----------------------: | :--------
2 | 01-JAN-21
1 | 01-MAR-21
1 | 01-APR-21
db<>fiddle here

Query to aggregate data from table into another table based on number of entries of an id column

I have a table like this:
order_id start_date end_date amount corrected_amount
1 2020-01-01 2020-01-31 100 95
1 2020-02-01 2020-02-28 200 200
1 2020-03-01 2020-03-30 100 100
1 2020-10-01 2020-11-25 200 95
2 2020-01-01 2020-05-30 500 250
3 2020-01-01 2020-12-31 400 5
And I am trying to create a query to aggregate this into a smaller table with just one row per order_id and I need to sum this together using a few rules that I am having some problems implementing.
In the case where there just exists one entry like for id 2 and 3 then I want to return just the order_id start_date, end_date and value from the amount column
In the case where there exists multiple entries like for 1 then I want to return the order_id, the minimum start_date, the maximum end_date, and for every end_date that is "lower" than todays date I want to sum up the corrected_amounts and also add this to the amount where end_date is "bigger" than today.
So for the table above the result would look like
order_id start_date end_date amount
1 2020-01-01 2020-11-25 595
2 2020-01-01 2020-05-30 500
3 2020-01-01 2020-12-31 400
Consider using IF:
WITH TestData AS (
SELECT 1 as order_id, DATE('2020-01-01') as start_date, DATE('2020-01-31') as end_date, 100 as amount, 95 as corrected_amount UNION ALL
SELECT 1, DATE('2020-02-01'), DATE('2020-02-28'), 200, 200 UNION ALL
SELECT 1, DATE('2020-03-01'), DATE('2020-03-30'), 100, 100 UNION ALL
SELECT 1, DATE('2020-10-01'), DATE('2020-11-25'), 200, 95 UNION ALL
SELECT 2, DATE('2020-01-01'), DATE('2020-05-30'), 500, 250 UNION ALL
SELECT 3, DATE('2020-01-01'), DATE('2020-12-31'), 400, 5
)
SELECT order_id,
MIN(start_date) AS start_date,
MAX(end_date) AS end_date,
IF(COUNT(*) > 1,
SUM(IF(end_date < CURRENT_DATE(), corrected_amount, amount)),
SUM(amount)
) as amount
FROM TestData
GROUP BY order_id
The result is: