Using MAX function in DateADD SQL. Error - Invalid aggregate function in where clause [MAX(date)] - sql

I have a table 'CSALES' having columns such as customerid,transactiondate,quantity,price. I'm trying to find customers who have not been active in 1 month from a list of dates present in the transactiondate column. I've tried the following code but I'm unsure about the approach and the code is giving a compilation error
SELECT C.CUSTOMERID
FROM CSALES C
WHERE C.CUSTOMERID NOT IN
(
SELECT CS.CUSTOMERID FROM CSALES as CS
WHERE CS.TRANSACTIONDATE > DATEADD(month, -1, MAX(CS.TRANSACTIONDATE )
);
I'm getting the following error
SQL compilation error: Invalid aggregate function in where clause [MAX(CS.TRANSACTIONDATE)]
What changes should I make in the code to reflect the requirement? Would MAX(date) be a right approach ?

SELECT CUSTOMERID
FROM
CSSALES
GROUP BY CUSTOMERID
HAVING
MAX(TRANSACTIONDATE) < ADD_MONTHS(CURRENT_DATE(),-1)
Shawnt00 is right the max date in the transaction table is irrelevant if you just want any customer that hasn't been active in 1 calendar month.
In snowflake use CURRENT_DATE() to get the date portion of Today then ADD_MONTHS(date,int) to get months. Other functions work two but these are pretty easy. If you only want customers to remove duplicate CUSTOMERIDS group by the column.

I think I am about to just repeat Matt's code, but...
With a CTE for some test data:
WITH CSALES(CUSTOMERID, TRANSACTIONDATE) as (
SELECT * FROM VALUES
(1, '2022-05-08'::date), -- to recent
(1, '2021-05-08'::date),
(2, '2021-05-08'::date), -- old enough
(2, '2020-05-08'::date)
)
We can use HAVING for a post aggregation filter.
SELECT C.CUSTOMERID, MAX(C.TRANSACTIONDATE) as last_trans
FROM CSALES C
GROUP BY 1
HAVING last_trans < DATEADD(month,-1,current_date());
As Matt noted there are few ways to find the "one month ago today" he used ADD_MONTHS, I have used DATEADD
CUSTOMERID
LAST_TRANS
2
2021-05-08
Now this code works the same as:
SELECT CUSTOMERID
FROM (
SELECT C.CUSTOMERID, MAX(C.TRANSACTIONDATE) as last_trans
FROM CSALES C
GROUP BY 1
)
WHERE last_trans < DATEADD(month,-1,current_date());
which gives:
CUSTOMERID
2
Albeit we now have hidden away the last transaction, if that was what was wanted, and added some extra select layers for no high level value.
And thus if we want to hide the last_tran in the HAVING version, we can because we have already working code, we can just push the MAX into the HAVING (and we have Matt's code)
SELECT C.CUSTOMERID
FROM CSALES C
GROUP BY 1
HAVING MAX(C.TRANSACTIONDATE) < DATEADD(month,-1,current_date());
which gives for the demo code:
CUSTOMERID
2
Date Options:
There are a couple ways to alter date/time, depending how you like to order you logic, I tend to prefer DATEADD:
SELECT
current_date() as cd_a,
CURRENT_DATE as cd_b,
DATEADD(month, -1, cd_a) as one_month_ago_a,
ADD_MONTHS(cd_a, -1) as one_month_ago_b;
gives:
CD_A
CD_B
ONE_MONTH_AGO_A
ONE_MONTH_AGO_B
2022-05-07
2022-05-07
2022-04-07
2022-04-07

SELECT
C.CUSTOMERID
FROM
CSALES C
GROUP BY
C.CUSTOMERID
HAVING
MAX(C.TRANSACTIONDATE)
<
DATEADD(
month,
-1,
(SELECT MAX(TRANSACTIONDATE) FROM CSALES)
)
Or, assuming you have a customer table...
SELECT
*
FROM
CUSTOMER C
WHERE
NOT EXISTS (
SELECT *
FROM CSALES CS
WHERE CS.CUSTOMERID = C.ID
AND CS.TRANSACTIONDATE >= DATEADD(
month,
-1,
(SELECT MAX(TRANSACTIONDATE) FROM CSALES)
)
)
Demo : dbfiddle

there are multiple possibilities, you must check which is faster
SELECT C.CUSTOMERID
FROM CSALES C
WHERE C.CUSTOMERID NOT IN
(
SELECT CS.CUSTOMERID FROM CSALES as CS CROSS JOIN (SELECT MAX(TRANSACTIONDATE) maxdate FROM CSALES) t1
WHERE CS.TRANSACTIONDATE > DATEADD(month, -1, maxdate)
);
GO
| CUSTOMERID |
| ---------: |
| 4 |
SELECT DISTINCT C.CUSTOMERID
FROM CSALES C CROSS JOIN (SELECT MAX(TRANSACTIONDATE) maxdate FROM CSALES) t1
WHERE NOT EXISTS (SELECT 1 FROM CSALES WHERE CUSTOMERID = c.CUSTOMERID AND TRANSACTIONDATE > DATEADD(month, -1, maxdate))
;
GO
| CUSTOMERID |
| ---------: |
| 4 |
db<>fiddle here

Related

Fill in blank dates for rolling average - CTE in Snowflake

I have two tables – activity and purchase
Activity table:
user_id date videos_watched
1 2020-01-02 3
1 2020-01-04 5
1 2020-01-07 5
Purchase table:
user_id purchase_date
1 2020-01-01
2 2020-02-02
What I would like to do is to get a 30 day rolling average since purchase on how many videos has been watched.
The base query is like this:
SELECT
DATEDIFF(DAY, p.purchase_date, a.date) AS day_since_purchase,
AVG(A.VIDEOS_VIEWED)
FROM PURCHASE P
LEFT OUTER JOIN ACTIVITY A ON P.USER_ID = A.USER_ID AND
A.DATE >= P.PURCHASE_DATE AND A.DATE <= DATEADD(DAY, 30, P.PURCHASE_DATE)
GROUP BY 1;
However, the Activity table only has records for each day a video has been logged. I would like to fill in the blanks for days a video has not been viewed.
I have started to look into using a CTE like this:
WITH cte AS (
SELECT date('2020-01-01') as fdate
UNION ALL
SELECT CAST(DATEADD(day,1,fdate) as date)
FROM cte
WHERE fdate < date('2020-04-01')
) select * from cte
cross join purchases p
left outer join activity a
on p.user id = a.user_id
and a.fdate = p.purchase_date
and a.date >= p.purchase_date and a.date <= dateadd(day, 30, p.purchase_date)
The end goal is to have something like this:
days_since_purchase videos_watched
1 3
2 0 --CTE coalesce inserted value
3 0
4 5
Been trying for the last couple of hours to get it right, but still can't really get the hang of it.
If you want to fill in the gaps in the result set, then I think you should be generating integers rather than dates:
WITH cte AS (
SELECT 1 as day_since_purchase
UNION ALL
SELECT 1 + day_since_purchase
FROM cte
WHERE day_since_purchase < 4
)
SELECT cte.day_since_purchase, COALESCE(avg_videos_viewed, 0)
FROM cte LEFT JOIN
(SELECT DATEDIFF(DAY, p.purchase_date, a.date) AS day_since_purchase,
AVG(A.VIDEOS_VIEWED) as avg_videos_viewed
FROM purchases p JOIN
activity a
ON p.user id = a.user_id AND
a.fdate = p.purchase_date AND
a.date >= p.purchase_date AND
a.date <= dateadd(day, 30, p.purchase_date)
GROUP BY 1
) pa
ON pa.day_since_purchase = cte.day_since_purchase;
You can use a recursive query to generate the 30 days following each purchase, then bring the activity table:
with cte as (
select
purchase_date,
client_id,
0 days_since_purchase,
purchase_date dt
from purchases
union all
select
purchase_date,
client_id,
days_since_purchase + 1
dateadd(day, days_since_purchase + 1, purchase_date)
from cte
where days_since_purchase < 30
)
select
c.days_since_purchase,
avg(colaesce(a. videos_watch, 0)) avg_ videos_watch
from cte c
left join activity a
on a.client_id = c.client_id
and a.fdate = c.purchase_date
and a.date = c.dt
group by c.days_since_purchase
Your question is unclear on whether you have a column in the activity table that stores the purchase date each row relates to. Your query has column fdate but not your sample data. I used that column in the query (without such column, you might end up counting the same activity in different purchases).

SQL - One Table with Two Date Columns. Count and Join

I have a table (vOKPI_Tickets) that has the following columns:
|CreationDate | CompletionDate|
I'd like to get a count on each of those columns, and group them by date. It should look something like this when complete:
| Date | Count-Created | Count-Completed |
I can get each of the counts individually, by doing something like this:
SELECT COUNT(TicketId)
FROM vOKPI_Tickets
GROUP BY CreationDate
and
SELECT COUNT(TicketId)
FROM vOKPI_Tickets
GROUP BY CreationDate
How can I combine the output into one table? I should also note that this will become a View.
Thanks in advance
Simple generic approach:
select
coalesce(crte.creationdate, cmpl.CompletionDate) as theDate,
crte.cnt as created,
cmpl.cnt as completed
from
(select creationdate, count (*) as cnt from vOKPI_Tickets where creationdate is not null group by creationdate) crte
full join
(select CompletionDate, count (*) as cnt from vOKPI_Tickets where CompletionDate is not null group by CompletionDate) cmpl
on crte.creationdate = cmpl.CompletionDate
You can unpivot and aggregate. A general method is:
select dte, sum(created), sum(completed)
from ((select creationdate as dte, 1 as created, 0 as completed
from vOKPI_Tickets
) union all
(select completed as dte, 0 created, 1 as completed
from vOKPI_Tickets
)
) t
group by dte;
In SQL Server, you can use cross apply for this:
select d.dt, sum(d.is_completed) count_created, sum(d.is_completed) count_completed
from vokpi_tickets t
cross apply (values (creationdate, 1, 0), (completion_date, 0, 1)) as d(dt, is_created, is_completed)
where d.dt is not null
group by d.dt

how to filter data in sql based on percentile

I have 2 tables, the first one is contain customer information such as id,age, and name . the second table is contain their id, information of product they purchase, and the purchase_date (the date is from 2016 to 2018)
Table 1
-------
customer_id
customer_age
customer_name
Table2
------
customer_id
product
purchase_date
my desired result is to generate the table that contain customer_name and product who made purchase in 2017 and older than 75% of customer that make purchase in 2016.
Depending on your flavor of SQL, you can get quartiles using the more general ntile analytical function. This basically adds a new column to your query.
SELECT MIN(customer_age) as min_age FROM (
SELECT customer_id, customer_age, ntile(4) OVER(ORDER BY customer_age) AS q4 FROM table1
WHERE customer_id IN (
SELECT customer_id FROM table2 WHERE purchase_date = 2016)
) q
WHERE q4=4
This returns the lowest age of the 4th-quartile customers, which can be used in a subquery against the customers who made purchases in 2017.
The argument to ntile is how many buckets you want to divide into. In this case 75%+ equals 4th quartile, so 4 buckets is OK. The OVER() clause specifies what you want to sort by (customer_age in our case), and also lets us partition (group) the data if we want to, say, create multiple rankings for different years or countries.
Age is a horrible field to include in a database. Every day it changes. You should have date-of-birth or something similar.
To get the 75% oldest value in 2016, there are several possibilities. I usually go for row_number() and count(*):
select min(customer_age)
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c join
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
)
where seqnum >= 0.75 * cnt;
Then, to use this for a query for 2017:
with a2016 as (
select min(customer_age) as customer_age
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
) c
where seqnum >= 0.75 * cnt
)
select c.*, cp.product_id
from customers c join
customer_products cp
on cp.customer_id = c.customer_id and
cp.purchase_date >= '2017-01-01' and
cp.purchase_date < '2018-01-01' join
a2016 a
on c.customer_age >= a.customer_age;

How to do a group by without having to pass all the columns from the select?

I have the following select, whose goal is to select all customers who had no sales since the day X, and also bringing the date of the last sale and the number of the sale:
select s.customerId, s.saleId, max (s.date) from sales s
group by s.customerId, s.saleId
having max(s.date) <= '05-16-2013'
This way it brings me the following:
19 | 300 | 26/09/2005
19 | 356 | 29/09/2005
27 | 842 | 10/05/2012
In another words, the first 2 lines are from the same customer (id 19), I wish to get only one record for each client, which would be the record with the max date, in the case, the second record from this list.
By that logic, I should take off s.saleId from the "group by" clause, but if I do, of course, I get the error:
Invalid expression in the select list (not contained in either an
aggregate function or the GROUP BY clause)
I'm using Firebird 1.5
How can I do this?
GROUP BY summarizes data by aggregating a group of rows, returning one row per group. You're using the aggregate function max(), which will return the maximum value from one column for a group of rows.
Let's look at some data. I renamed the column you called "date".
create table sales (
customerId integer not null,
saleId integer not null,
saledate date not null
);
insert into sales values
(1, 10, '2013-05-13'),
(1, 11, '2013-05-14'),
(1, 12, '2013-05-14'),
(1, 13, '2013-05-17'),
(2, 20, '2013-05-11'),
(2, 21, '2013-05-16'),
(2, 31, '2013-05-17'),
(2, 32, '2013-03-01'),
(3, 33, '2013-05-14'),
(3, 35, '2013-05-14');
You said
In another words, the first 2 lines are from the same customer(id 19), i wish he'd get only one record for each client, which would be the record with the max date, in the case, the second record from this list.
select s.customerId, max (s.saledate)
from sales s
where s.saledate <= '2013-05-16'
group by s.customerId
order by customerId;
customerId max
--
1 2013-05-14
2 2013-05-16
3 2013-05-14
What does that table mean? It means that the latest date on or before May 16 on which customer "1" bought something was May 14; the latest date on or before May 16 on which customer "2" bought something was May 16. If you use this derived table in joins, it will return predictable results with consistent meaning.
Now let's look at a slightly different query. MySQL permits this syntax, and returns the result set below.
select s.customerId, s.saleId, max(s.saledate) max_sale
from sales s
where s.saledate <= '2013-05-16'
group by s.customerId
order by customerId;
customerId saleId max_sale
--
1 10 2013-05-14
2 20 2013-05-16
3 33 2013-05-14
The sale with ID "10" didn't happen on May 14; it happened on May 13. This query has produced a falsehood. Joining this derived table with the table of sales transactions will compound the error.
That's why Firebird correctly raises an error. The solution is to drop saleId from the SELECT clause.
Now, having said all that, you can find the customers who have had no sales since May 16 like this.
select distinct customerId from sales
where customerID not in
(select customerId
from sales
where saledate >= '2013-05-16')
And you can get the right customerId and the "right" saleId like this. (I say "right" saleId, because there could be more than one on the day in question. I just chose the max.)
select sales.customerId, sales.saledate, max(saleId)
from sales
inner join (select customerId, max(saledate) max_date
from sales
where saledate < '2013-05-16'
group by customerId) max_dates
on sales.customerId = max_dates.customerId
and sales.saledate = max_dates.max_date
inner join (select distinct customerId
from sales
where customerID not in
(select customerId
from sales
where saledate >= '2013-05-16')) no_sales
on sales.customerId = no_sales.customerId
group by sales.customerId, sales.saledate
Personally, I find common table expressions make it easier for me to read SQL statements like that without getting lost in the SELECTs.
with no_sales as (
select distinct customerId
from sales
where customerID not in
(select customerId
from sales
where saledate >= '2013-05-16')
),
max_dates as (
select customerId, max(saledate) max_date
from sales
where saledate < '2013-05-16'
group by customerId
)
select sales.customerId, sales.saledate, max(saleId)
from sales
inner join max_dates
on sales.customerId = max_dates.customerId
and sales.saledate = max_dates.max_date
inner join no_sales
on sales.customerId = no_sales.customerId
group by sales.customerId, sales.saledate
then you can use following query ..
EDIT changes made after comment by likeitlikeit for only one row per CustomerID even when we will have one case where we have multiple saleID for customer with certain condition -
select x.customerID, max(x.saleID), max(x.x_date) from (
select s.customerId, s.saleId, max (s.date) x_date from sales s
group by s.customerId, s.saleId
having max(s.date) <= '05-16-2013'
and max(s.date) = ( select max(s1.date)
from sales s1
where s1.customeId = s.customerId))x
group by x.customerID
You can Try Maxing the s.saleId (Max(s.saleId)) and removing it from the Group By clause
A subquery should do the job, I can't test it right now but it seems ok:
SELECT s.customerId, s.saleId, subq.maxdate
FROM sales AS s
INNER JOIN (SELECT customerId, MAX(date) AS maxdate
FROM sales
GROUP BY customerId, saleId
HAVING MAX(s.date) <= '05-16-2013'
) AS subq
ON s.customerId = subq.customerId AND s.date = subq.maxdate

SQL Grouping Issues

I'm attempting to write a query that will return any customer that has multiple work orders with these work orders falling on different days of the week. Every work order for each customer should be falling on the same day of the week so I want to know where this is not the case so I can fix it.
The name of the table is Core.WorkOrder, and it contains a column called CustomerId that specifies which customer each work order belongs to. There is a column called TimeWindowStart that can be used to see which day each work order falls on (I'm using DATENAME(weekday, TimeWindowStart) to do so).
Any ideas how to write this query? I'm stuck here.
Thanks!
Select ...
From WorkOrder As W
Where Exists (
Select 1
From WorkOrder As W1
And W1.CustomerId = W.CustomerId
And DatePart( dw, W1.TimeWindowStart ) <> DatePart( dw, W.TimeWindowStart )
)
SELECT *
FROM (
SELECT *,
COUNT(dp) OVER (PARTITION BY CustomerID) AS cnt
FROM (
SELECT DISTINCT CustomerID, DATEPART(dw, TimeWindowStart) AS dp
FROM workOrder
) q
) q
WHERE cnt >= 2
SELECT CustomerId,
MIN(DATENAME(weekday, TimeWindowStart)),
MAX(DATENAME(weekday, TimeWindowStart))
FROM Core.WorkOrder
GROUP BY CustomerId
HAVING MIN(DATENAME(weekday, TimeWindowStart)) != MAX(DATENAME(weekday, TimeWindowStart))