Segment purchases based on new vs returning - sql

I'm trying to write a query that can select a particular date and count how many of those customers have placed orders previously and how many are new. For simplicity, here is the table layout:
id (auto) | cust_id | purchase_date
-----------------------------------
1 | 1 | 2010-11-15
2 | 2 | 2010-11-15
3 | 3 | 2010-11-14
4 | 1 | 2010-11-13
5 | 3 | 2010-11-12
I was trying to select orders by a date and then join any previous orders on the same user_id from previous dates, then count how many had orders, vs how many didnt. This was my failed attempt:
SELECT SUM(
CASE WHEN id IS NULL
THEN 1
ELSE 0
END ) AS new, SUM(
CASE WHEN id IS NOT NULL
THEN 1
ELSE 0
END ) AS returning
FROM (
SELECT o1 . *
FROM orders AS o
LEFT JOIN orders AS o1 ON ( o1.user_id = o.user_id
AND DATE( o1.created ) = "2010-11-15" )
WHERE DATE( o.created ) < "2010-11-15"
GROUP BY o.user_id
) AS t

Given a reference data (2010-11-15), then we are interested in the number of distinct customers who placed an order on that date (A), and we are interested in how many of those have placed an order previously (B), and how many did not (C). And clearly, A = B + C.
Q1: Count of orders placed on reference date
SELECT COUNT(DISTINCT Cust_ID)
FROM Orders
WHERE Purchase_Date = '2010-11-15';
Q2: List of customers placing order on reference date
SELECT DISTINCT Cust_ID
FROM Orders
WHERE Purchase_Date = '2010-11-15';
Q3: List of customers who placed an order on reference date who had ordered before
SELECT DISTINCT o1.Cust_ID
FROM Orders AS o1
JOIN (SELECT DISTINCT o2.Cust_ID
FROM Orders AS o2
WHERE o2.Purchase_Date = '2010-11-15') AS c1
ON o1.Cust_ID = c1.Cust_ID
WHERE o1.Purchase_Date < '2010-11-15';
Q4: Count of customers who placed an order on reference data who had ordered before
SELECT COUNT(DISTINCT o1.Cust_ID)
FROM Orders AS o1
JOIN (SELECT DISTINCT o2.Cust_ID
FROM Orders AS o2
WHERE o2.Purchase_Date = '2010-11-15') AS c1
ON o1.Cust_ID = c1.Cust_ID
WHERE o1.Purchase_Date < '2010-11-15';
Q5: Combining Q1 and Q4
There are several ways to do the combining. One is to use Q1 and Q4 as (complicated) expressions in the select-list; another is to use them as tables in the FROM clause which don't need a join between them because each is a single-row, single-column table that can be joined in a Cartesian product. Another would be a UNION, where each row is tagged with what it calculates.
SELECT (SELECT COUNT(DISTINCT Cust_ID)
FROM Orders
WHERE Purchase_Date = '2010-11-15') AS Total_Customers,
(SELECT COUNT(DISTINCT o1.Cust_ID)
FROM Orders AS o1
JOIN (SELECT DISTINCT o2.Cust_ID
FROM Orders AS o2
WHERE o2.Purchase_Date = '2010-11-15') AS c1
ON o1.Cust_ID = c1.Cust_ID
WHERE o1.Purchase_Date < '2010-11-15') AS Returning_Customers
FROM Dual;
(I'm blithely assuming MySQL has a DUAL table - similar to Oracle's. If not, it is trivial to create a table with a single column containing a single row of data. Update 2: bashing the MySQL 5.5 Manual shows that 'FROM Dual' is supported but not needed; MySQL is happy without a FROM clause.)
Update 1: added qualifier 'o1.Cust_ID' in key locations to avoid 'ambiguous column name' as indicated in the comment.

How about
SELECT * FROM
(SELECT * FROM
(SELECT CUST_ID, COUNT(*) AS ORDER_COUNT, 1 AS OLD_CUSTOMER, 0 AS NEW_CUSTOMER
FROM ORDERS
GROUP BY CUST_ID
HAVING ORDER_COUNT > 1)
UNION ALL
(SELECT CUST_ID, COUNT(*) AS ORDER_COUNT, 0 AS OLD_CUSTOMER, 1 AS NEW_CUSTOMER
FROM ORDERS
GROUP BY CUST_ID
HAVING ORDER_COUNT = 1)) G
INNER JOIN
(SELECT CUST_ID, ORDER_DATE
FROM ORDERS) O
USING (CUST_ID)
WHERE ORDER_DATE = [date of interest] AND
OLD_CUSTOMER = [0 or 1, depending on what you want] AND
NEW_CUSTOMER = [0 or 1, depending on what you want]
Not sure if that'll do the whole thing, but it might provide a starting point.
Share and enjoy.

select count(distinct o1.cust_id) as repeat_count,
count(distinct o.cust_id)-count(distinct o1.cust_id) as new_count
from orders o
left join (select cust_id
from orders
where purchase_date < "2010-11-15"
group by cust_id) o1
on o.cust_id = o1.cust_id
where o.purchase_date = "2010-11-15"

Related

SQL Selecting & Counting From Another Table

I have this query that works excellently and gives me the results I want, however, does anybody know how I can remove any rows that have 0 orders? I am sure it is something simple, I just can't get my head around it.
In other words, should it only show the top 2 rows?
SELECT customers.id, customers.companyname, customers.orgtype,
(SELECT COALESCE(SUM(invoicetotal), 0)
FROM invoice_summary
WHERE invoice_summary.cid = customers.ID
and invoice_summary.submitted between '2022-08-01' and '2022-08-31'
) AS total,
(SELECT COUNT(invoicenumber)
FROM invoice_summary
WHERE invoice_summary.cid = customers.ID
and invoice_summary.submitted between '2022-08-01' and '2022-08-31'
) AS orders
FROM customers WHERE customers.orgtype = 10
ORDER BY total DESC
ID
Company
Org
Total
Orders
1232
ACME 1
10
523.36
3
6554
ACME 2
10
411.03
2
1220
ACME 3
10
0.00
0
4334
ACME 4
10
0.00
0
You can use a CTE to keep the request simple :
WITH CTE_Orders AS (
SELECT customers.id, customers.companyname, customers.orgtype,
(SELECT COALESCE(SUM(invoicetotal), 0)
FROM invoice_summary
WHERE invoice_summary.cid = customers.ID
and invoice_summary.submitted between '2022-08-01' and '2022-08-31'
) AS total,
(SELECT COUNT(invoicenumber)
FROM invoice_summary
WHERE invoice_summary.cid = customers.ID
and invoice_summary.submitted between '2022-08-01' and '2022-08-31'
) AS orders
FROM customers WHERE customers.orgtype = 10
ORDER BY total DESC
)
SELECT * FROM CTE_Orders WHERE orders > 0
You will find aditionals informations about CTE on Microsoft documentation : https://learn.microsoft.com/fr-fr/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-ver16
You can do this by transforming your subquery to a CROSS APPLYof a pre-aggregated table
SELECT
c.id,
c.companyname,
c.orgtype,
ins.total,
ins.orders
FROM customers c
CROSS APPLY (
SELECT
COUNT(*) AS orders,
ISNULL(SUM(ins.invoicetotal), 0) AS total
FROM invoice_summary ins
WHERE ins.cid = c.ID
AND ins.submitted between '20220801' and '20220831'
GROUP BY () -- do not remove the GROUP BY
) ins
WHERE c.orgtype = 10
ORDER BY
ins.total DESC;
You can also do this with an INNER JOIN against it
SELECT
c.id,
c.companyname,
c.orgtype,
ins.total,
ins.orders
FROM customers c
INNER JOIN (
SELECT
ins.cid,
COUNT(*) AS orders,
ISNULL(SUM(ins.invoicetotal), 0) AS total
FROM invoice_summary ins
WHERE ins.submitted between '20220801' and '20220831'
GROUP BY ins.cid
) ins ON ins.cid = c.ID
WHERE c.orgtype = 10
ORDER BY
ins.total DESC;
Quick and dirty way would be to dump your results into a temp table, delete the records you don't want, then select what remains.
Add this to the end of your select before the FROM clause:
INTO #temptable
Then delete the records you don't want:
DELETE FROM #temptable WHERE [Orders] = 0
Then just select from the temp table.
There are other ways to do this, and you should read up on the downsides of temp tables before implementing this solution.

How to calculate rows count in where statement in sql?

I have two tables in SQL Server:
order (columns: order_id, payment_id)
payment (columns: payment_id, is_pay)
I want to get all orders with two more properties:
How many rows where is_pay is 1:
where payment_id = <...> payment.is_pay = 1
And the count of the rows (without the first filter)
select count(*)
from payment
where payment_id = <...>
So I wrote this query:
select
*,
(select count(1) from payment p
where p.payment_id = o.payment_id and p.is_pay = 1) as total
from
order o
The problem is how to calculate the rows without the is_pay = 1?
I mean the "some of many"
First aggregate in payment and then join to order:
SELECT o.*, p.total_pay, p.total
FROM [order] o
LEFT JOIN (
SELECT payment_id, SUM(is_pay) total_pay, COUNT(*) total
FROM payment
GROUP BY payment_id
) p ON p.payment_id = o.payment_id;
Change LEFT to INNER join if all orders have at least 1 payment.
Also, if is_pay's data type is BIT, change SUM(is_pay) to:
SUM(CASE WHEN is_pay = 1 THEN 1 ELSE 0 END)
Use a join with conditional aggregation:
SELECT
o.payment_id,
COUNT(CASE WHEN p.is_pay = 1 THEN 1 END) AS pay_cnt,
COUNT(p.payment_id) AS all_cnt
FROM "order" o
LEFT JOIN payment p
ON o.payment_id = p.payment_id
GROUP BY
o.payment_id;
You can use a lateral join (outer apply) for this:
select o.*, p.*
from orders o outer apply
(select count(*) as num_payments,
sum(case when is_pay = 1 then 1 else 0 end) as num_payments_1
from payments p
where p.payment_id = o.payment_id
) p;
Note: Assuming that is_pay only takes on the values of 0 and 1 (which seems reasonable given the name), you can simplify this to:
select o.*, p.*
from orders o outer apply
(select count(*) as num_payments,
sum(is_pay) as num_payments_1
from payments p
where p.payment_id = o.payment_id
) p;
If you are looking for counts per payment id then use this:
select
payment.payment_id,
count(*) as total,
count(case when payment.is_pay = 1 then 1 else 0) end as total_is_pay_orders
from orders
left join payment
on orders.payment_id = payment.payment_id
group by 1

Sum of all values except the first

I have the following three tables:
Customers:
Cust_ID,
Cust_Name
Products:
Prod_ID,
Prod_Price
Orders:
Order_ID,
Cust_ID,
Prod_ID,
Quantity,
Order_Date
How do I display each costumer and how much they spent excluding their very first purchase?
[A] - I can get the total by multiplying Products.Prod_Price and Orders.Quantity, then GROUP by Cust_ID
[B] - I also can get the first purchase by using TOP 1 on Order_Date for each customer.
But I couldnt figure out how to produce [A]-[B] in one query.
Any help will be greatly appreciated.
For SQL-Server 2005, 2008 and 2008R2:
; WITH cte AS
( SELECT
c.Cust_ID, c.Cust_Name,
Amount = o.Quantity * p.Prod_Price,
Rn = ROW_NUMBER() OVER (PARTITION BY c.Cust_ID
ORDER BY o.Order_Date)
FROM
Customers AS c
JOIN
Orders AS o ON o.Cust_ID = c.Cust_ID
JOIN
Products AS p ON p.Prod_ID = o.Prod_ID
)
SELECT
Cust_ID, Cust_Name,
AmountSpent = SUM(Amount)
FROM
cte
WHERE
Rn >= 2
GROUP BY
Cust_ID, Cust_Name ;
For SQL-Server 2012, using the FIRST_VALUE() analytic function:
SELECT DISTINCT
c.Cust_ID, c.Cust_Name,
AmountSpent = SUM(o.Quantity * p.Prod_Price)
OVER (PARTITION BY c.Cust_ID)
- FIRST_VALUE(o.Quantity * p.Prod_Price)
OVER (PARTITION BY c.Cust_ID
ORDER BY o.Order_Date)
FROM
Customers AS c
JOIN
Orders AS o ON o.Cust_ID = c.Cust_ID
JOIN
Products AS p ON p.Prod_ID = o.Prod_ID ;
Another way (that works in 2012 only) using OFFSET FETCH and CROSS APPLY:
SELECT
c.Cust_ID, c.Cust_Name,
AmountSpent = SUM(x.Quantity * x.Prod_Price)
FROM
Customers AS c
CROSS APPLY
( SELECT
o.Quantity, p.Prod_Price
FROM
Orders AS o
JOIN
Products AS p ON p.Prod_ID = o.Prod_ID
WHERE
o.Cust_ID = c.Cust_ID
ORDER BY
o.Order_Date
OFFSET
1 ROW
-- FETCH NEXT -- not needed,
-- 20000000000 ROWS ONLY -- can be removed
) AS x
GROUP BY
c.Cust_ID, c.Cust_Name ;
Tested at SQL-Fiddle
Note that the second solution returns also the customers with only one order (with the Amount as 0) while the other two solutions do not return those customers.
Which version of SQL? If 2012 you might be able to do something interesting with OFFSET 1, but I'd have to ponder much more how that works with grouping.
EDIT: Adding a 2012 specific solution inspired by #ypercube
I wanted to be able to use OFFSET 1 within the WINDOW to it al in one step, but the syntax I want isn't valid:
SUM(o.Quantity * p.Prod_Price) OVER (PARTITION BY c.Cust_ID
ORDER BY o.Order_Date
OFFSET 1)
Instead I can specify the row boxing, but have to filter the result set to the correct set. The query plan is different from #ypercube's, but the both show 50% when run together. They each run twice as as fast as my original answer below.
WITH cte AS (
SELECT c.Cust_ID
,c.Cust_Name
,SUM(o.Quantity * p.Prod_Price) OVER(PARTITION BY c.Cust_ID
ORDER BY o.Order_ID
ROWS BETWEEN 1 FOLLOWING
AND UNBOUNDED FOLLOWING) AmountSpent
,rn = ROW_NUMBER() OVER(PARTITION BY c.Cust_ID ORDER BY o.Order_ID)
FROM Customers AS c
INNER JOIN
Orders AS o ON o.Cust_ID = c.Cust_ID
INNER JOIN
Products AS p ON p.Prod_ID = o.Prod_ID
)
SELECT Cust_ID
,Cust_Name
,ISNULL(AmountSpent ,0) AmountSpent
FROM cte WHERE rn=1
My more general solution is similar to peter.petrov's, but his didn't work "out of the box" on my sample data. That might be an issue with my sample data or not. Differences include use of CTE and a NOT EXISTS with a correlated subquery.
CREATE TABLE Customers (Cust_ID INT, Cust_Name VARCHAR(10))
CREATE TABLE Products (Prod_ID INT, Prod_Price MONEY)
CREATE TABLE Orders (Order_ID INT, Cust_ID INT, Prod_ID INT, Quantity INT, Order_Date DATE)
INSERT INTO Customers SELECT 1 ,'Able'
UNION SELECT 2, 'Bob'
UNION SELECT 3, 'Charlie'
INSERT INTO Products SELECT 1, 10.0
INSERT INTO Orders SELECT 1, 1, 1, 1, GetDate()
UNION SELECT 2, 1, 1, 1, GetDate()
UNION SELECT 3, 1, 1, 1, GetDate()
UNION SELECT 4, 2, 1, 1, GetDate()
UNION SELECT 5, 2, 1, 1, GetDate()
UNION SELECT 6, 3, 1, 1, GetDate()
;WITH CustomersFirstOrder AS (
SELECT Cust_ID
,MIN(Order_ID) Order_ID
FROM Orders
GROUP BY Cust_ID
)
SELECT c.Cust_ID
,c.Cust_Name
,ISNULL(SUM(Quantity * Prod_Price),0) CustomerOrderTotalAfterInitialPurchase
FROM Customers c
LEFT JOIN (
SELECT Cust_ID
,Quantity
,Prod_Price
FROM Orders o
INNER JOIN
Products p ON o.Prod_ID = p.Prod_ID
WHERE NOT EXISTS (SELECT 1 FROM CustomersFirstOrder a WHERE a.Order_ID=o.Order_ID)
) b ON c.Cust_ID = b.Cust_ID
GROUP BY c.Cust_ID
,c.Cust_Name
DROP TABLE Customers
DROP TABLE Products
DROP TABLE Orders
Try this. It should do it.
SELECT c1.cust_name ,
c1.cust_id ,
SUM(p1.Prod_Price)
FROM orders o1
JOIN products p1 ON o1.prod_id = p1.prod_id
JOIN customers c1 ON o1.cust_id = c1.cust_id
LEFT JOIN ( SELECT o2.cust_id ,
MIN(o2.Order_Date) AS Order_Date
FROM orders o2
GROUP BY o2.cust_id
) t ON o1.cust_id = t.cust_id
AND o1.Order_Date = t.Order_Date
WHERE t.Order_Date IS NULL
GROUP BY c1.cust_name ,
c1.cust_id
You have to number orders by Customer and then you can have the amount for the first order and next orders with a CTE and ROW_NUMBER() like this:
; WITH NumberedOrders
AS ( SELECT Customers.Cust_Id ,
Customers.Cust_Name ,
ROW_NUMBER() OVER ( ORDER BY Customers.Cust_id ) AS Order_Number ,
Orders.Order_Date ,
Products.Prod_price * Orders.Quantity AS Amount
FROM Orders
INNER JOIN Customers ON Orders.Cust_Id = Customers.Cust_Id
INNER JOIN Products ON Orders.Prod_Id = Products.Prod_Id
)
SELECT Cust_Id ,
SUM(CASE WHEN Order_Number = 1 THEN Amount
ELSE 0
END) AS A_First_Order ,
SUM(CASE WHEN Order_Number = 1 THEN 0
ELSE Amount
END) AS B_Other_orders ,
SUM(Amount) AS C_All_orders
FROM NumberedOrders
GROUP BY Cust_Id
ORDER BY Cust_Id

select orders from first time customers

I need help building a SQL query that returns orders from customers who have only ordered once.
The tables and relevant fields are as follows:
Order Customer
------- -----------
orderId customerId
orderDate
customerId
etc.
I'm looking for a result set of Order records where there is only one occurence of the customer id. For the following data set...
[orderId] [customerId] [orderDate] [etc.]
---------- ------------ ------------ ------------
o1 c1 1/1/14 foo
o2 c2 1/1/14 baz
o3 c3 1/3/14 bar
o4 c2 1/3/14 wibble
I would like the results to be
[orderId] [orderDate] [etc.]
--------- ----------- ------
o1 1/1/14 foo
o3 1/3/14 bar
Orders o2 and o4 are ommitted because c2 has ordered twice.
Any help would be greatly appreciated.
Sorry, didn't put my failed attempt. This is what I tried...
SELECT customerId,
orderId,
orderDate,
Count(*)
FROM Orders
GROUP BY orderId,
orderDate,
customerID
HAVING Count(*) = 1
ORDER BY orderId
It appears to return all the orders.
Try the following (assuming SQL Server 2005+):
;WITH CTE AS
(
SELECT *,
N = COUNT(*) OVER(PARTITION BY customerId)
FROM Orders
)
SELECT *
FROM CTE
WHERE N = 1
Since sometimes a pedestrian approach is preferred over complex CTEs, you can use a derived table if you want (but since it's using the OVER clause, you'll still need SQL Server 2005+):
SELECT *
FROM ( SELECT *,
N = COUNT(*) OVER(PARTITION BY customerId)
FROM Orders) T
WHERE N = 1
Alternatively (if for example you are in an older than 2005 version of SQL-Server), you can use the GROUP BY / HAVING COUNT(*)=1 method to find customers with only 1 order and then join back to the Orders table (no need for aggregate functions in all the columns):
SELECT o.*
FROM Orders o
JOIN
( SELECT customerId
FROM Orders
GROUP BY customerId
HAVING COUNT(*) = 1
) c
ON c.customerId = o.customerId ;
or use NOT EXISTS (no COUNT() needed and it works even in MySQL):
SELECT o.*
FROM Orders o
WHERE NOT EXISTS
( SELECT 1
FROM Orders c
WHERE c.customerId = o.customerId
AND c.orderId <> o.orderId
) ;
This will list all first-time customers in your ORDERS table.
SELECT [customerID],
MIN([orderId]) AS [orderId],
MIN([orderDate]) AS [orderDate],
MIN([etc.]) AS [etc.]
FROM [Orders]
GROUP BY [customerID]
HAVING Count(*) = 1
ORDER BY [customerID]
In order to bring back all the additional columns you would need to wrap them in an aggregate such as MIN/MAX.
It is arbitrary which to use as there will only be one row per group anyway. This does assume that all columns in the table are of datatypes valid for such aggregation however (examples of datatypes that aren't are BIT, or XML)

Last order item in Oracle SQL

I need to list columns from customer table, the date from first order and all data from last one, in a 1:N relationship between customer and order tables. I'm using Oracle 10g.
How the best way to do that?
TABLE CUSTOMER
---------------
id NUMBER
name VARCHAR2(200)
subscribe_date DATE
TABLE ORDER
---------------
id NUMBER
id_order NUMBER
purchase_date DATE
purchase_value NUMBER
Here is one way of doing it, using the row_number function, one join, and on aggregation:
select c.*,
min(o.purchase_date) as FirstPurchaseDate,
min(case when seqnum = 1 then o.id_order end) as Last_IdOrder,
min(case when seqnum = 1 then o.purchase_date end) as Last_PurchaseDate,
min(case when seqnum = 1 then o.purchase_value end) as Last_PurchaseValue
from Customer c join
(select o.*,
row_number() over (partition by o.id order by purchase_date desc) as seqnum
from orders o
) o
on c.customer_id = o.order_id
group by c.customer_id, c.name, c.subscribe_date
It's not obvious how to join the customer table to the orders table (order is a reserved word in Oracle so your table can't be named order). If we assume that the id_order in orders joins to the id in customer
SELECT c.id customer_id,
c.name name,
c.subscribe_date,
o.first_purchase_date,
o.id last_order_id,
o.purchase_date last_order_purchase_date,
o.purchase_value last_order_purchase_value
FROM customer c
JOIN (SELECT o.*,
min(o.purchase_date) over (partition by id_order) first_purchase_date,
rank() over (partition by id_order order by purchase_date desc) rnk
FROM orders o) o ON (c.id = o.id_order)
WHERE rnk = 1
I'm confused by your field names, but I'm going to assume that ORDER.id is the id in the CUSTOMER table.
The earliest order date is easy.
select CUSTOMER.*, min(ORDER.purchase_date)
from CUSTOMER
inner join ORDER on CUSTOMER.id = ORDER.id
group by CUSTOMER.*
To get the last order data, join this to the ORDER table again.
select CUSTOMER.*, min(ORD_FIRST.purchase_date), ORD_LAST.*
from CUSTOMER
inner join ORDER ORD_FIRST on CUSTOMER.id = ORD_FIRST.id
inner join ORDER ORD_LAST on CUSTOMER.id = ORD_LAST.id
group by CUSTOMER.*, ORD_LAST.*
having ORD_LAST.purchase_date = max(ORD_FIRST.purchase_date)
Maybe something like this assuming the ID field in the Order table is actually the Customer ID:
SELECT C.*, O1.*, O2.purchase_Date as FirstPurchaseDate
FROM Customer C
LEFT JOIN
(
SELECT Max(purchase_date) as pdate, id
FROM Orders
GROUP BY id
) MaxPurchaseOrder
ON C.Id = MaxPurchaseOrder.Id
LEFT JOIN Orders O1
ON MaxPurchaseOrder.pdate = O1.purchase_date
AND MaxPurchaseOrder.id = O1.id
LEFT JOIN
(
SELECT Min(purchase_date) as pdate, id
FROM Orders
GROUP BY id
) MinPurchaseOrder
ON C.Id = MinPurchaseOrder.Id
LEFT JOIN Orders O2
ON MinPurchaseOrder.pdate = O2.purchase_date
AND MinPurchaseOrder.id = O2.id
And the sql fiddle.