My task is to count the number of cars rented per month during a specified year.
I have two tables one called cars and one called rental
car table has (car_id, type, monthly_cost)
rental table has (rental_id, car_id, person_id, rental_date, return_date)
The problem is that I can count the number of rentals in a month just on the rental_date,
but that is just giving me new rentals.
For example rental_date: 2020-02-04 and return_date: 2020-05-05, this rental needs to be counted in feb, mars, apr and may.
select extract(month from rental_date) as month, count(*)
from rental
where extract(year from rental_date) = 2020
group by extract(month from rental_date);
This is my query for counting "new rentals".
One approach uses generate_seris() to generate all starts of months that year, and then brings the table with a left join on rental periods that overlap with each month:
select d.dt, count(r.rental_id) as cnt_rentals
from generate_series(date '2020-01-01', date '2020-12-01', '1 month') d(dt)
left join rental r
on r.rental_date < d.dt + interval '1 month'
and r.return_date >= d.dt
group by d.dt
Note that this properly handles rentals that cross the beginning of the year, while your original code did not. It also allows months without any rental.
Related
I want to calculate a month-to-date profit with the current date 15th, so it will sum the profit every month until the 15th day. Is there any way/idea to reset the sum cumulative / running total every month in bigquery? i want to filter the windows function so that every 1st day in the month will reset the sum cumulative in the profit_cumulative column.
so I want the result to be like this :
Date
Categories
Profit
Profit_Cumulative
2022-06-14
A
295.62
6350.58
2022-06-15
A
459.80
6810.38
2022-07-01
A
501.03
501.03
2022-07-02
A
258.97
760.0
instead of this:
Date
Categories
Profit
Profit_Cumulative
2022-06-14
A
295.62
6350.58
2022-06-15
A
459.80
6810.38
2022-07-01
A
501.03
7311.72
2022-07-02
A
258.97
7570.69
and this is my code :
b AS (
WITH
a AS (
SELECT
DATE_TRUNC(DATE(created_at),day) AS date_,
EXTRACT(YEAR
FROM
created_at) AS year,
EXTRACT(MONTH
FROM
created_at) AS month,
EXTRACT(DAY
FROM
created_at) AS day,
SAFE_SUBTRACT(retail_price, cost) AS profit,
products.category AS product_category
FROM
`bigquery-public-data.thelook_ecommerce.order_items` orderitems
INNER JOIN
`bigquery-public-data.thelook_ecommerce.products` products
ON
orderitems.product_id = products.id
AND created_at >= '2022-06-01 00:00:00 UTC'
AND created_at <='2022-08-15 23:59:59 UTC'
GROUP BY
date_,
year,
month,
day,
product_category,
profit )
SELECT
a.date_ AS Date,
a.year,
a.month,
a.day,
a.product_category AS Product_Categories,
SUM(a.profit) AS Profit
FROM
a
WHERE
a.day <= 15
GROUP BY
a.date_,
a.year,
a.month,
a.day,
a.product_category
ORDER BY
a.date_,
year,
month,
day,
a.product_category)
SELECT
Date,
b.year,
b.month,
b.day,
b.Product_Categories,
b.profit,
SUM(Profit) OVER(PARTITION BY product_categories ORDER BY date) AS profit_cumulative
FROM
b```
I think you are nearly there!
You just need to add the month to your WINDOW PARTITION like so:
SUM(Profit) OVER(PARTITION BY product_categories, month ORDER BY date) AS profit_cumulative
I'll try to solve it with this one line of code:
SUM(Profit) OVER(PARTITION BY month, product_categories ORDER BY date) AS profit_cumulative
add the 'month' in your partition by before product categories, so it will reset the cumulative sum every 1st day of the month.
I have the following table called vacations, where the employee number is displayed along with the start and end date of their vacations:
id_employe
start
end
1001
2020-12-24
2021-01-04
What I am looking for is to visualize the amount of vacation days that each employee had, but separating them by employee number, month, year and number of days; without taking into account non-business days (Saturdays, Sundays and holidays).
I have the following query, which manages to omit Saturday and Sunday from the posting:
SELECT id_employee,
EXTRACT(YEAR FROM t.Date) AS year,
EXTRACT(MONTH FROM t.Date) AS month,
SUM(WEEKDAY(`Date`) < 5) AS days
FROM (SELECT v.id_employee,
DATE_ADD(v.start, interval s.seq - 1 DAY) AS Date
FROM vacations v CROSS JOIN seq_1_to_100 s
WHERE DATE_ADD(v.start, interval s.seq - 1 DAY) <= v.end
ORDER BY v.id_employee, v.start, s.seq ) t
GROUP BY id_employee, EXTRACT(YEAR_MONTH FROM t.Date);
My question is, how could I in addition to skipping the weekends, also skip the holidays? I suppose that I should establish another table where the dates of those holidays are stored, but how could my * query * be adapted to perform the comparison?
If we consider that the employee 1001 took his vacations from 2020-12-24 to 2021-01-04 and we take Christmas and New Years as holidays, we should get the following result:
id_employee
month
year
days
1001
12
2020
5
1001
1
2021
1
After you have created a table that stores the holiday dates, then you probably can do something like this:
SELECT id_employee,
EXTRACT(YEAR FROM t.Date) AS year,
EXTRACT(MONTH FROM t.Date) AS month,
SUM(CASE WHEN h.holiday_date IS NULL THEN WEEKDAY(`Date`) < 5 END) AS days
FROM (SELECT v.id_employee,
DATE_ADD(v.start, interval s.seq - 1 DAY) AS Date
FROM vacations v CROSS JOIN seq_1_to_100 s
WHERE DATE_ADD(v.start, interval s.seq - 1 DAY) <= v.end
ORDER BY v.id_employee, v.start, s.seq ) t
LEFT JOIN holidays h ON t.date=h.holiday_date
GROUP BY id_employee, EXTRACT(YEAR_MONTH FROM t.Date);
Assuming that the holidays table structure would be something like this:
CREATE TABLE holidays (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
holiday_date DATE,
holiday_description VARCHAR(255));
Then LEFT JOIN it to your current query and change the SUM() slightly by adding CASE expression to check. If the ON t.date=h.holiday_date in the left join matches, there will be result of field h.holiday_date, otherwise it will be NULL, hence only the CASE h.holiday_date WHEN IS NULL .. will be considered.
Demo fiddle
Adding this solution compatible with both MariaDB and MySQL version that supports common table expression:
WITH RECURSIVE cte AS
(SELECT id_employee, start, start lvdt, end FROM vacations
UNION ALL
SELECT id_employee, start, lvdt+INTERVAL 1 DAY, end FROM cte
WHERE lvdt+INTERVAL 1 DAY <=end)
SELECT id_employee,
YEAR(v.lvdt) AS year,
MONTH(v.lvdt) AS month,
SUM(CASE WHEN h.holiday_date IS NULL THEN WEEKDAY(v.lvdt) < 5 END) AS days
FROM cte v
LEFT JOIN holidays h
ON v.lvdt=h.holiday_date
GROUP BY id_employee,
YEAR(v.lvdt),
MONTH(v.lvdt);
I want to get as many new customers as I have per month. I know I could get the minimum purchase, but the problem I have is that if a customer had already had a purchase, but stopped buying for more than a year, he is considered a new user again.
You could help me by getting how many new customers I have per month. That is, whose minimum purchase date has been in that month and has not bought anything in the year before that minimum date.
I tried with this code, but if in this case, a customer had his first purchase in February 2019 and then the next purchase was in March 2020, just consider the purchase of February, when he should be new user in February 2019 and March 2020
select to_char(B.fp, 'YYYY-MM') month, count(B.email)
from(
select A.email, A.first_purchase fp
from(
select email, date(min(created)) first_purchase
from "Order_table" oo
group by email)A
where A.first_purchase >= (A.first_purchase + INTERVAL '-1 year'))B
group by 1,2
Use lag():
select to_char(ot.fp, 'YYYY-MM') as yyyymm
count(*) filter (where ot.fp > ot.prev_fp + interval '1 year' or b.prev_fp is null) as cnt_new
from (select ot.*, lag(ot.fp) over (partition by ot.email order by ot.fp) as prev_fp
from Order_table ot
) ot
group by yyyymm;
I would like to find every customers whose rental days are equal to greater than 10days. My solution is not giving me any syntax error but it gives incorrect total number of customers count. Here is my solution:
with rental_history as (
select
customer_id
,rental_date
,return_date
,rental_date + interval '10 day' as ban_date
,coalesce(return_date, now())-rental_date as days_out
from rental
)
select count(*) as number_of_lost_rentals
from rental_history where days_out >= interval '10 day'
I am getting incorrect count and see below error:
Here is the database schema:
For these type of scenarios you need to understand 2 things:
if you subtract 2 dates it will return difference in days as Integer
If you subtract 2 timestamp, it will return the difference in interval.
what I understood from your question and comments and considering your return_date and rental_date fields are timestamp, you should write your query like this:
select
count(*)
from rental
where return_date::date - rental_date::date>=10
But above will give the results from all time. If you want count of clients whose pending rental days are greater than equal to 10 days from today then try like this:
select
count(*)
from rental
where current_date - rental_date::date>=10 and
If you want to count customers who have a rental exceeding 10 days, then:
select count(distinct customer_id)
from rental
where rental_date <= coalesce(return_date, current_date) - interval 10 'day';
If you want to count the days across all rentals for a customer -- which is how I would interpret the question -- then you need aggregation:
select count(*)
from (select customer_id,
sum( (coalesce(return_date, current_date) - rental_date) )::int as num_days
from rental r
group by customer_id
) c
where num_days >= 10;
Note: It is also unclear from your question if someone who rents on 2020-09-01 and returns on 2020-09-01 counts as 1 day or 2 days.
I am trying to count the monthly number of merchants (and the total transaction amount they've processed) who have made at least 4 transactions each month in the last 2 years from a table containing daily transaction by merchants.
My query is as follow:
SELECT trx.month, COUNT(trx.merchants), SUM(trx.amount)
FROM
(
SELECT
DATE_TRUNC('month', transactions.payment_date) AS month,
merchants,
COUNT(DISTINCT payment_id) AS volume,
SUM(transactions.payment_amount) AS amount
FROM transactions
WHERE transactions.date >= NOW() - INTERVAL '2 years'
GROUP BY 1, 2
) AS trx
WHERE trx.volume >= 4
My question is: will this query pull the right data? If so, is this the most efficient way of writing it or can I improve the performance of this query?
First of all we must think about the time range. You say that you want at least four transactions each month in the last 24 months. But you certainly don't require this for, say, October 2018, when running the query on October 10, 2018. Neither do you want to only look at only the last twenty days of October 2016 then. We would want to look at the complete October 2016 till the complete September 2018.
Next we want to make sure that a merchant had at least four transactions each month. In other words: they had transactions each month and the minimum number of transactions per month was four. We can use window functions to run over monthly transactions to check this.
select merchants, month, volume, amount
from
(
select
merchants,
date_trunc('month', payment_date) as month,
count(distinct payment_id) as volume,
sum(payment_amount) as amount,
count(*) over (partition by merchants) number_of_months,
min(count(distinct payment_id)) over (partition by merchants) min_volume
from transactions
where date between date_trunc('month', current_date) - interval '24 months'
and date_trunc('month', current_date) - interval '1 days'
group by merchants, date_trunc('month', payment_date)
) monthly
where number_of_months = 24
and min_volume >= 4
order by merchants, month;
This gives you the list of merchants fulfilling the requirements with their monthly data. If you want the number of merchants instead, then aggregate. E.g.
select count(distinct merchants), sum(amount) as total
from (...) monthly
where number_of_months = 24 and min_volume >= 4;
or
select month, count(distinct merchants), sum(amount) as total
from (...) monthly
where number_of_months = 24 and min_volume >= 4
group by month
order by month;
for get only the list of merchant you could use having for filter the result of the aggreated values for distinct number of payement_id and month
SELECT merchants
FROM transactions
WHERE transactions.date >= NOW() - INTERVAL '2 years'
GROUP BY merchants
having count(distinct DATE_TRUNC('month', transactions.payment_date)) =24
and COUNT(DISTINCT payment_id) >= 4
And for you updated question just a suggestion
You could join with the query that return the marchant with more then 4 volume for each month in tow year and filter the result for aggreated directly in subquery using having
SELECT trx.month, COUNT(trx.merchants), SUM(trx.amount)
FROM (
SELECT DATE_TRUNC('month', transactions.payment_date) AS month
, merchants
, COUNT(DISTINCT payment_id) AS volume
, SUM(transactions.payment_amount) AS amount
FROM transactions
INNER JOIN (
SELECT merchants
FROM transactions
WHERE transactions.date >= NOW() - INTERVAL '2 years'
GROUP BY merchants
having count(distinct DATE_TRUNC('month', transactions.payment_date)) =24
and COUNT(DISTINCT payment_id) >= 4
) A on A.merchant = transactions.merchant
WHERE transactions.date >= NOW() - INTERVAL '2 years'
GROUP BY 1, 2
HAVING volume >= 4
) AS trx