Count and sum per day multiple tables without join - sql

I want count the number of orders invoices delivery and sum the amount of orders invoices delivery per day.
Like this:
date nb orders orders$ nb delivery
day1 5 1234,56 3
day2 6 665,88 7
..
The first time I tried this, it was ok for one day but not for a week, for example:
SELECT
(SELECT COUNT(OPP.OPPNUM_0) FROM OPPOR OPP WHERE OPP.CREDAT_0=%1%),
(SELECT SUM(OPP.OPPAMT_0) FROM OPPOR OPP WHERE OPP.CREDAT_0=%1%),
(SELECT COUNT(SQH.SQHNUM_0) FROM SQUOTE SQH WHERE SQH.CREDAT_0=%1%),
(SELECT SUM(SQH.YCUMHTSEL_0) FROM SQUOTE SQH WHERE SQH.CREDAT_0=%1%),
(SELECT COUNT(SOH.SOHNUM_0) FROM SORDER SOH WHERE SOH.CREDAT_0=%1%),
(SELECT SUM(SOH.ORDNOT_0) FROM SORDER SOH WHERE SOH.CREDAT_0=%1%)
FROM dual

MySQL provides built-in query for filtering by day:
SELECT COUNT(*) FROM table_name WHERE anydatefiled >= NOW() - INTERVAL 1 DAY

Related

How to join partitioned table with another one

Sorry for the newbie question, but I'm really having trouble with the following issue:
Say, I have this code in place:
WITH active_pass AS (SELECT DATE_TRUNC(fr.day, MONTH) AS month, id,
CASE
WHEN SUM(fr.imps) > 100 THEN 1
WHEN SUM(fr.imps) < 100 THEN 0
END AS active_or_passive
FROM table1 AS fr
WHERE day between (CURRENT_DATE() - 730) AND (CURRENT_DATE() - EXTRACT(DAY FROM CURRENT_DATE()))
GROUP BY month, id
ORDER BY month desc),
# summing the score for each customer (sum for the whole year)
active_pass_assigned AS (SELECT id, month,
SUM(SUM(active_or_passive)) OVER (PARTITION BY id ORDER BY month rows BETWEEN 3 PRECEDING AND 1 PRECEDING) AS trailing_act
FROM active_pass AS a
GROUP BY month, id
ORDER BY MONTH desc)
What it does is it creates a trailing total over the last 3 months to see how many of those last 3 month the customer was active. However, I have no idea how to join with the next table to get a sum of revenue that said client generated. What I tried is this:
SELECT c.id, DATE_TRUNC(day, MONTH) AS month, SUM(revenue) AS Rev, name
FROM table2 AS c
JOIN active_pass_assigned AS a
ON c.id = a.id
WHERE day between (CURRENT_DATE() - 365) AND (CURRENT_DATE() - EXTRACT(DAY FROM CURRENT_DATE()))
GROUP BY month, id, name
ORDER BY month DESC
However, it returns waaay higher values for Revenue than the actual ones and I have no idea why. Furthermore, could you please tell me how to join those two tables together so that I only get the customer's revenue on the months his activity was equal to 3?

Retrieve Customers with a Monthly Order Frequency greater than 4

I am trying to optimize the below query to help fetch all customers in the last three months who have a monthly order frequency +4 for the past three months.
Customer ID
Feb
Mar
Apr
0001
4
5
6
0002
3
2
4
0003
4
2
3
In the above table, the customer with Customer ID 0001 should only be picked, as he consistently has 4 or more orders in a month.
Below is a query I have written, which pulls all customers with an average purchase frequency of 4 in the last 90 days, but not considering there is a consistent purchase of 4 or more last three months.
Query:
SELECT distinct lines.customer_id Customer_ID, (COUNT(lines.order_id)/90) PurchaseFrequency
from fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY Customer_ID
HAVING PurchaseFrequency >=4;
I tried to use window functions, however not sure if it needs to be used in this case.
I would sum the orders per month instead of computing the avg and then retrieve those who have that sum greater than 4 in the last three months.
Also I think you should select your interval using "month(CURRENT_DATE()) - 3" instead of using a window of 90 days. Of course if needed you should handle the case of when current_date is jan-feb-mar and in that case go back to oct-nov-dec of the previous year.
I'm not familiar with Google BigQuery so I can't write your query but I hope this helps.
So I've found the solution to this using WITH operator as below:
WITH filtered_orders AS (
select
distinct customer_id ID,
extract(MONTH from date) Order_Month,
count(order_id) CountofOrders
from customer_order_lines` lines
where EXTRACT(YEAR FROM date) = 2022 AND EXTRACT(MONTH FROM date) IN (2,3,4)
group by ID, Order_Month
having CountofOrders>=4)
select distinct ID
from filtered_orders
group by ID
having count(Order_Month) =3;
Hope this helps!
An option could be first count the orders by month and then filter users which have purchases on all months above your threshold:
WITH ORDERS_BY_MONTH AS (
SELECT
DATE_TRUNC(lines.date, MONTH) PurchaseMonth,
lines.customer_id Customer_ID,
COUNT(lines.order_id) PurchaseFrequency
FROM fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY PurchaseMonth, Customer_ID
)
SELECT
Customer_ID,
AVG(PurchaseFrequency) AvgPurchaseFrequency
FROM ORDERS_BY_MONTH
GROUP BY Customer_ID
HAVING COUNT(1) = COUNTIF(PurchaseFrequency >= 4)

Weekly active or lapsing status in BigQuery

I want to see the status of a customer each week based on their activity.
If a customer has transacted in the last 7 days it should appear as active and if the customer has not transacted in 8-21 days it should appear as "lapsing".
I have these values in my table:
enter image description here
Desired output refrence:
Week# Customer_id Status
If you want a row for every combination of week and customer_id, you could create a large cross join of the distinct combinations of those two from your orders table, then match all orders back into that superset keeping the latest (that's before that date).
with base_table as (
select distinct customer_id, week_date
from orders
cross join (SELECT week_date FROM UNNEST(GENERATE_DATE_ARRAY((select min(order_date) from orders), CURRENT_DATE(), INTERVAL 7 DAY)) AS week_date)
)
select base_table.customer_id, base_table.week_date, max(order_date) as latest_order,
case
when DATE_DIFF(week_date,max(order_date),DAY) <= 7 then 'active'
when DATE_DIFF(week_date,max(order_date),DAY) >= 8 and DATE_DIFF(week_date,max(order_date),DAY) <= 21 then 'lapsing'
else 'not active'
end as status
from base_table
cross join orders
where orders.customer_id = base_table.customer_id
and order_date <= week_date
group by 1, 2

Filter customers with atleast 3 transactions a year for the past 2 years Presto/SQL

I have a table of customer transactions called cust_trans where each transaction made by a customer is stored as one row. I have another col called visit_date that contains the transaction date. I would like to filter the customers who transact atleast 3 times a year for the past 2 years.
The data looks like below
Id visit_date
---- ------
1 01/01/2019
1 01/02/2019
1 01/01/2019
1 02/01/2020
1 02/01/2020
1 03/01/2020
1 03/01/2020
2 01/02/2019
3 02/04/2019
I would like to know the customers who visited atleast 3 times every year for the past two years
ie. I want below output.
id
---
1
From the customer table only one person visited atleast 3 times for 2 years.
I tried with below query but it only checks if total visits greater than or equal to 3
select id
from
cust_scan
GROUP by
id
having count(visit_date) >= 3
and year(date(max(visit_date)))-year(date(min(visit_date))) >=2
I would appreciate any help, guidance or suggestions
One option would be to generate a list of distinct ids, cross join it with the last two years, and then bring the original table with a left join. You can then aggregate to count how many visits each id had each year. The final step is to aggregate again, and filter with a having clause
select i.id
from (
select i.id, y.yr, count(c.id) cnt
from (select distinct id from cust_scan) i
cross join (values
(date_trunc('year', current_date)),
(date_trunc('year', current_date) - interval '1' year)
) as y(yr)
left join cust_scan c
on i.id = c.id
and c.visit_date >= y.yr
and c.visit_date < y.yr + interval '1' year
group by i.id, y.yr
) t
group by i.id
having min(cnt) >= 3
Another option would be to use two correlated subqueries:
select distinct id
from cust_scan c
where
(
select count(*)
from cust_scan c1
where
c1.id = c.id
and c1.visit_date >= date_trunc('year', current_date)
and c1.visit_date < date_trunc('year', current_date) + interval '1' year
) >= 3
and (
select count(*)
from cust_scan c1
where
c1.id = c.id
and c1.visit_date >= date_trunc('year', current_date) - interval '1' year
and c1.visit_date < date_trunc('year', current_date)
) >= 3
I assume you mean calendar years. I think I would use two levels of aggregation:
select ct.id
from (select ct.id, year(visit_date) as yyyy, count(*) as cnt
from cust_trans ct
where ct.visit_date >= '2019-01-01' -- or whatever
group by ct.id
) ct
group by ct.id
having count(*) = 2 and -- both year
min(cnt) >= 3; -- at least three transactions
If you want the last two complete years, just change the where clause in the subquery.
You can use a similar idea -- of two aggregations -- if you want the last two years relative to the current date. That would be two full years, rather than 1 and some fraction of the current year.

Get list of months from one table and counts for each from another

I'm trying to pull this through in Postgres 11.8:
SELECT count(distinct e.id) counter_employees,
(SELECT count(distinct id) FROM employees
WHERE date_trunc('month',date_hired) = period AND company = 11
) hires,
FROM employees e
WHERE period IN (SELECT DISTINCT make_date(...) FROM amounts)
I cant figure out how to declare that the period the subquery should check is outside the subquery. Also, the period is not from a table but generated, so there is not a column in amounts to relate to the employees inside the subquery.
employee table:
id date_hired company
431 2020-01-03 11
422 2020-01-02 11
323 2020-02-03 11
amounts table:
payment_period amount company
202001 999 11
202002 999 11
For every payment period in amounts I want to get some data such as employee count and hires of that period:
period count hires
202001 5 1
202002 6 ...
One option uses aggregation and window functions. If you have hires for each month, then you can get the information directly from employees, like so:
select
date_trunc('month', date_hired) month_hired,
sum(count(*)) over(order by date_trunc('month', date_hired)) no_employees,
count(*) hires
from employees
group by date_trunc('month', date_hired)
On the other hand, if there are months without hires, then you could use generate_series() to create the list of months, then bring the employees with a left join, and aggregate:
select
d.month_hired,
sum(count(e.id)) over(order by d.month_hired) no_employees,
count(e.id) hires
from (
select generate_series(
date_trunc('month', min(date_hired)),
date_trunc('month', max(date_hired)),
interval '1' month
) month_hired
from employees
) d
left join employees e
on e.date_hired >= d.month_hired
and e.date_hired < d.month_hired + interval '1' month
group by d.month_hired
We could run another count for every period distilled from amounts, but that's expensive - unless there are only very few?
For more than a few, compute counts per period for the whole employees table, plus a running total. Then LEFT JOIN to it, should be pretty efficient:
SELECT mon AS period, e.mon_hired AS count, e.all_hired AS hires
FROM (
SELECT to_date(payment_period, 'YYYYMM') AS mon
FROM (SELECT DISTINCT payment_period FROM amounts) a0
) a
LEFT JOIN (
SELECT date_trunc('month', date_hired) AS mon
, count(*) AS mon_hired
, sum(count(*)) OVER (ORDER BY date_trunc('month', date_hired)) AS all_hired
FROM employees e
GROUP BY 1
) e USING (mon)
ORDER BY 1;
This assumes we can just count all employees hired so far to get the total number of hires. (Nobody ever gets fired.)
Works just fine as long as there are rows for every period. Else we need to fill in for the gaps. We can compute a complete grid, or default to the latest row in case of a missing month like this:
WITH e AS (
SELECT date_trunc('month', date_hired) AS mon
, count(*) AS mon_hired
, sum(count(*)) OVER (ORDER BY date_trunc('month', date_hired)) AS all_hired
FROM employees e
GROUP BY 1
)
SELECT mon AS period, ae.*
FROM (
SELECT to_date(payment_period, 'YYYYMM') AS mon
FROM (SELECT DISTINCT payment_period FROM amounts) a0
) a
LEFT JOIN LATERAL (
SELECT CASE WHEN e.mon = a.mon THEN e.mon_hired ELSE 0 END AS count -- ①
, e.all_hired AS hires
FROM e
WHERE e.mon <= a.mon
ORDER BY e.mon DESC
LIMIT 1
) ae USING (mon)
ORDER BY 1;
① If nothing changed for the month, we need to fall back to the last month with data. Take the total count from there, but the monthly count is 0.
We can run a window function over an aggregate on the same query level. See:
Group and count events per time intervals, plus running total
Related:
PostgreSQL: running count of rows for a query 'by minute'
Aside: don't omit the AS keyword for a column alias. See:
Date column arithmetic in PostgreSQL query