Counting table rows where a column value first appears after date XYZ - sql

We have an orders table that looks like so:
orderId
customerId
orderDate
320
45
2020-01-01
455
67
2021-02-11
122
45
2019-04-22
Based on this I need to count all 'new' customers that first entered the system after date XYZ.
I'm thinking of something involving a having clause but wondered if there was a better way to go about it. Something along these lines (SQL may not be exact, but the general idea):
select count(*) from (select distinct(customerId) from orders group by customerId having min(orders.orderDate) > XYZ) as foo
Is there a better / faster way to go about this?

Assuming you wanted the count of new customers coming into the system after 2021-02-11, you could try:
SELECT COUNT(DISTINCT customerId)
FROM orders o1
WHERE
orderDate > '2021-02-11' AND
NOT EXISTS (SELECT 1 FROM orders o2
WHERE o2.customerId = o1.customerId AND o2.orderDate <= '2021-02-11');
The above logic reads, in plain English, to count any customer record appearing after 2021-02-11, where that customer also did not appear previously in a record on or before 2021-02-11.

Your query is already fine, another option is to use partition and count only 1 customerId (alternative of distinct keyword)
select count(1) from (select
row_number() over (partition by customerId order by orderDate asc) rn
from orders where orderDate > '2020-01-01') t1
where rn = 1
Try dbfiddle

You are already using this method:
select count(*)
from (select customerId, min(orderDate) as first_orderDate
from orders o
group by customerId
having min(orderDate) >= '2021-02-11'
) oc;
For performance, I would suggest using the customers table:
select count(*)
from customers c
where not exists (select 1
from orders o
where o.customerId = c.customerId and
o.orderDate < '2021-02-11'
);
For this, you want an index on orders(customerId, orderDate).

Related

Grouping in SQL using CASE Statements

Hello I am trying to group multiple customer orders into buckets in SQL, the output should look something like it does below. Do I have to use a case statement to group them?
Table1 looks like:
CustomerID
Order_date
1
somedate
2
somedate
3
somedate
2
somedate
Edit: # of customers meaning if CustomerID 2 had 2 orders he/she would be of the in the bucket of #of orders of 2.
Output should be something like this?
# of Customers
# of Orders
2
1
1
2
My code so far is:
select count(*) CustomerID
FROM Table1
GROUP BY CustomerID;
Use a double aggregation:
SELECT COUNT(*) AS num_customers, cnt AS num_orders
FROM
(
SELECT CustomerID, COUNT(*) AS cnt
FROM Table1
GROUP BY CustomerID
) t
GROUP BY cnt;
The inner subquery finds the number of orders for each customer. The outer query then aggregates by number of orders and finds out the number of customers having each number of orders.
If you want to sort your tables and your users depending on the number of orders they made, this query should work:
SELECT CustomerID, COUNT(CustomerID) as NbOrder
FROM Table1
GROUP BY(NbOrder)
I believe what you want to do is get the count of orders by customer, first, via aggregation. Then get the count of customers by order count from that query.
SELECT count(*) as count_of_customers, count_of_orders
FROM
(
SELECT customerid, count(*) as count_of_orders
FROM your_table
GROUP BY customerid
) sub
GROUP BY count_of_orders
ORDER BY count_of_orders

Access SQL - Delete records with same identifiers based on criteria

I have a database with multiple records with the same identifier. I want to remove just one of those records.
OrderNum Cost
10001 254
10002 343
10002 300
10003 435
10003 323
For the above table, lets say I just want to delete the records with duplicate Order Numbers that have the smaller cost. Ex: Records 10002, keep the one with a cost of 343, delete the smaller 300.
Here is the query I have come up with, however I am using the cost to identify the duplicate which is bad if there is a similar cost somewhere else in the table.
DELETE Orders.*
FROM Orders
WHERE (cost In
(Select min(cost) FROM Orders
GROUP BY [OrderNum] HAVING Count(*) > 1))
How can I query through using the Order Number and deleting the one smaller of value that has a duplicate?
I'll explain the solution in stages:
SELECT OrderNum, Min(Cost) as MinCost
FROM Orders
GROUP BY OrderNum
HAVING COUNT(*) > 1
This returns records you intend to delete:
OrderNum MinCost
10002 300
10003 323
The following is another version of the same query using sub-SELECTs:
SELECT *
FROM
(
SELECT OrderNum, Min(Cost) as MinCost
FROM Orders
GROUP BY OrderNum
HAVING COUNT(*) > 1
) M
We want to join the marked for deletion records back to the Orders table, one way to achieve this is using an EXISTS statement:
SELECT *
FROM Orders O
WHERE EXISTS (
SELECT *
FROM
(
SELECT OrderNum, Min(Cost) as MinCost
FROM Orders
GROUP BY OrderNum
HAVING COUNT(*) > 1
) M
WHERE O.OrderNum = M.OrderNum
AND O.Cost = M.MinCost
)
Now that we've mastered the SELECT statement needed, we turn it into the DELETE statement:
DELETE
FROM Orders O
WHERE EXISTS (
SELECT *
FROM
(
SELECT OrderNum, Min(Cost) as MinCost
FROM Orders
GROUP BY OrderNum
HAVING COUNT(*) > 1
) M
WHERE O.OrderNum = M.OrderNum
AND O.Cost = M.MinCost
)
If you have large amounts of data, you may wish to create an index to optimize join:
CREATE INDEX IX_Orders_001 ON Orders (OrderNum, Cost);
You want to really do something like:
WHERE (ordernum, cost) IN (SELECT ordernum, min(cost) as cost FROM Orders GROUP BY OrderNum HAVING COUNT(*) > 1);
But Access doesn't support tuples like this as many larger RDBMS's do.
Instead you could concatenate your tuples:
WHERE ordernum & cost IN (SELECT ordernum & min(cost) FROM Orders GROUP BY OrderNum HAVING Count(*) > 1);
This will remove all duplicates but the largest one for each
delete a from yourtable a
join
(select *, row_number() OVER (partition by ordernum, cost ORDER BY ordernum, cost desc) rownum from yourtable )b
on a.ordernum=b.ordernum
where rownum<>1
You can use JOIN to delete the smaller cost of each OrderNum like below :
DELETE Orders.*
FROM Orders
join (Select OrderNum, max(cost) as cost FROM Orders
GROUP BY [OrderNum] HAVING Count(*) > 1) as R
on Orders.OrderNum=R.OrderNum and Orders.cost < R.cost

select last order date for each customer id

I have a list of customerids, orderids and order dates that I want to use in another query to determine if the customer has ordered again since this date.
Example Data:
CustomerID OrderID OrderDate
6619 16034 2012-11-15 10:23:02.603
6858 18482 2013-03-25 11:07:14.680
4784 17897 2013-02-20 14:45:43.640
5522 16188 2012-11-22 14:53:49.840
6803 18016 2013-02-28 10:41:16.713
Query:
SELECT dbo.[Order].CustomerID, dbo.[Order].OrderID, dbo.[Order].OrderDate
FROM dbo.[Order] INNER JOIN
dbo.OrderLine ON dbo.[Order].OrderID = dbo.OrderLine.OrderID
WHERE (dbo.OrderLine.ProductID in (42, 44, 45, 46,47,48))
If you need anything else, just ask.
UPDATE::
This query brings back the results as shown above
Need to know if the customer has ordered again since, for any product id after ordering one of the products in the query above..
Mike
If you are only interested in last order date for each customer
select customerid, max(orderdate) from theTable group by customerid;
In MS SQL you can use TOP 1 for this, you also need to order by your order date column in descending order.
see here SQL Server - How to select the most recent record per user?
SELECT dbo.[Order].CustomerID, MAX(dbo.[Order].OrderDate)
FROM dbo.[Order] INNER JOIN
dbo.OrderLine ON dbo.[Order].OrderID = dbo.OrderLine.OrderID
WHERE (dbo.OrderLine.ProductID in (42, 44, 45, 46,47,48))
GROUP BY dbo.[Order].CustomerID
Gets the latest orderdate of a customer.
ROW_NUMBER in a CTE should work:
WITH cte
AS (SELECT customerid,
orderid,
orderdate,
rn = Row_number()
OVER(
partition BY customerid
ORDER BY orderdate DESC)
FROM dbo.tblorder
WHERE orderdate >= #orderDate
AND customerid = #customerID)
SELECT customerid, orderid, orderdate
FROM cte
WHERE rn = 1
DEMO
(i've omitted the join since no column from the other table was needed, simply add it)
CustomerID and latest OrderDate for customers that have ordered any product after ordering any of a set of products
I suspect they were promotional products
SELECT [Order].[CustomerID], max([Order].[OrderDate])
FROM [Order]
JOIN [Order] as [OrderBase]
ON [OrderBase].[CustomerID] = [Order].[CustomerID]
AND [OrderBase].[OrderDate] < [Order].[OrderDate]
JOIN [OrderLine]
ON [OrderLine].[OrderID] = [OrderBase].[OrderID]
AND [OrderLine].[ProductID] in (42,44,45,46,47,48)
GROUP BY [Order].[CustomerID]

Sql query to select records for a single distinct customer id

I have written a query for sales by customers in groups it is as follows:
SELECT customerid,
SUM (salestax1)As total_salestax1,
SUM(total_payment_received) As total_payment_recieved,
COUNT (orderid)as order_qty,
SUM(paymentamount)As paymentamount
FROM Orders_74V94W6D22$
WHERE orderdate between '7/6/2011 16:35' and '2/3/2012 11:53'
GROUP BY customerid
but this query shows only 5 fields but I need to show following fields:
orderid billingcompanyname billingfirstname billinglastname
billingcountry shipcountry paymentamount creditcardtransactionid
orderdate creditcardauthorizationdate orderstatus
total_payment_received tax1_title salestax1
then how to deal with it?
you need to understand what GROUP BY means.
If you are grouping by customerId, you will have only one customer because all data is grouped into it.
How do you want to group by orderid and display the orderid on your result set? If you have 10 order ids, do you expect 10 rows on the result? If yes, fine, group by it but I don't think that's what you want
EDIT:
Well, this is NOT a good idea, your table structure is WRONG and I dont think you fully understand that a group by means, BUT I think this query will get your result:
SELECT customerid,
(select top 1 [column1] from Orders_74V94W6D22$ where customerid = ORD.customerid),
(select top 1 [column2] from Orders_74V94W6D22$ where customerid = ORD.customerid),
(select top 1 [column3] from Orders_74V94W6D22$ where customerid = ORD.customerid),
SUM (salestax1)As total_salestax1,
SUM(total_payment_received) As total_payment_recieved,
COUNT (orderid)as order_qty,
SUM(paymentamount)As paymentamount
FROM Orders_74V94W6D22$ ORD
WHERE orderdate between '7/6/2011 16:35' and '2/3/2012 11:53'
GROUP BY customerid
To select more about the customer, you need to use your query as a sub query, something like:
Select distinct c.[column1], c.[column2], c.[column3], tbl.*
From Orders_74V94W6D22$ c inner join (
SELECT customerid,
SUM (salestax1)As total_salestax1,
SUM(total_payment_received) As total_payment_recieved,
COUNT (orderid)as order_qty,
SUM(paymentamount)As paymentamount
FROM Orders_74V94W6D22$
WHERE orderdate between '7/6/2011 16:35' and '2/3/2012 11:53'
GROUP BY customerid
) as tbl on tbl.customerid = c.customerid
but you cant logically select something about 1 order as youve grouped multiple orders

SQL question about GROUP BY

I've been using SQL for a few years, and this type of problem comes up here and there, and I haven't found an answer. But perhaps I've been looking in the wrong places - I'm not really sure what to call it.
For the sake of brevity, let's say I have a table with 3 columns: Customer, Order_Amount, Order_Date. Each customer may have multiple orders, with one row for each order with the amount and date.
My Question: Is there a simple way in SQL to get the DATE of the maximum order per customer?
I can get the amount of the maximum order for each customer (and which customer made it) by doing something like:
SELECT Customer, MAX(Order_Amount) FROM orders GROUP BY Customer;
But I also want to get the date of the max order, which I haven't figured out a way to easily get. I would have thought that this would be a common type of question for a database, and would therefore be easy to do in SQL, but I haven't found an easy way to do it yet. Once I add Order_Date to the list of columns to select, I need to add it to the Group By clause, which I don't think will give me what I want.
Apart from self-join you can do:
SELECT o1.*
FROM orders o1 JOIN orders o2 ON o1.Customer = o2.Customer
GROUP BY o1.Customer, o1.Order_Amount
HAVING o1.Order_Amount = MAX(o2.Order_Amount);
There's a good article reviewing various approaches.
And in Oracle, db2, Sybase, SQL Server 2005+ you would use RANK() OVER.
SELECT * FROM (
SELECT *
RANK() OVER (PARTITION BY Customer ORDER BY Order_Amount DESC) r
FROM orders) o
WHERE r = 1;
Note: If Customer has more than one order with maximum Order_Amount (i.e. ties), using RANK() function would get you all such orders; to get only first one, replace RANK() with ROW_NUMBER().
There's no short-cut... the easiest way is probably to join to a sub-query:
SELECT
*
FROM
orders JOIN
(
SELECT Customer, MAX(Order_Amount) AS Max_Order_Amount
FROM orders
GROUP BY Customer
) maxOrder
ON maxOrder.Customer = orders.Customer
AND maxOrder.Max_Order_Amount = orders.Order_Amount
you will want to join on the same table...
SELECT Customer, order_date, amt
FROM orders o,
( SELECT Customer, MAX(Order_Amount) amt FROM orders GROUP BY Customer ) o2
WHERE o.customer = o2.customer
AND o.order_amount = o2.amt
;
Another approach for the collection:
WITH tempquery AS
(
SELECT
Customer
,Order_Amount
,Order_Date
,row_number() OVER (PARTITION BY Customer ORDER BY Order_Amount DESC) AS rn
FROM
orders
)
SELECT
Customer
,Order_Amount
,Order_Date
FROM
tempquery
WHERE
rn = 1
If your DB Supports CROSS APPLY you can do this as well, but it doesn't handle ties correctly
SELECT [....]
FROM Customer c
CROSS APPLY
(SELECT TOP 1 [...]
FROM Orders o
WHERE c.customerID = o.CustomerID
ORDER BY o.Order_Amount DESC) o
See this data.SE query
You could try something like this:
SELECT Customer, MAX(Order_Amount), Order_Date
FROM orders O
WHERE ORDER_AMOUNT = (SELECT MAX(ORDER_AMOUNT) FROM orders WHERE CUSTOMER = O.CUSTOMER)
GROUP BY CUSTOMER, Order_Date
with t as
(
select CUSTOMER,Order_Date ,Order_Amount,max(Order_Amount) over (partition
by Customer) as
max_amount from orders
)
select * from t where t.Order_Amount=max_amount