Getting date, and count of unique customers when first order was placed - sql

I have a table called orders that looks like this:
+--------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| memberid | int(11) | YES | | NULL | |
| deliverydate | date | YES | | NULL | |
+--------------+---------+------+-----+---------+-------+
And that contains the following data:
+------+----------+--------------+
| id | memberid | deliverydate |
+------+----------+--------------+
| 1 | 991 | 2019-10-25 |
| 2 | 991 | 2019-10-26 |
| 3 | 992 | 2019-10-25 |
| 4 | 992 | 2019-10-25 |
| 5 | 993 | 2019-10-24 |
| 7 | 994 | 2019-10-21 |
| 6 | 994 | 2019-10-26 |
| 8 | 995 | 2019-10-26 |
+------+----------+--------------+
I would like a result set returning each unique date, and a separate column showing how many customers that placed their first order that day.
I'm having problems with querying this the right way, especially when the data consists of multiple orders the same day from the same customer.
My approach has been to
Get all unique memberids that placed an order during the time period I want to look at
Filter out the ones that placed their first order during the period by comparing the memberids that has placed an order before the timeperiod
Grouping by delivery date, and counting all unique memberids (but this obviously counts unique memberids each day individually!)
Here's the corresponding SQL:
SELECT deliverydate,COUNT(DISTINCT memberid) FROM orders
WHERE
MemberId IN (SELECT DISTINCT memberid FROM orders WHERE deliverydate BETWEEN '2019-10-25' AND '2019-10-26')
AND NOT
MemberId In (SELECT DISTINCT memberid FROM orders WHERE deliverydate < '2019-10-25')
GROUP BY deliverydate
ORDER BY deliverydate ASC;
But this results in the following with the above data:
+--------------+--------------------------+
| deliverydate | COUNT(DISTINCT memberid) |
+--------------+--------------------------+
| 2019-10-25 | 2 |
| 2019-10-26 | 2 |
+--------------+--------------------------+
The count for 2019-10-26 should be 1.
Appreciate any help :)

You can aggregate twice:
select first_deliverydate, count(*) cnt
from (
select min(deliverydate) first_deliverydate
from orders
group by memberid
) t
group by first_deliverydate
order by first_deliverydate
The subquery gives you the first order data of each member, then the outer query aggregates and counts by first order date.
This demo on DB Fiddle with your sample data returns:
first_deliverydate | cnt
:----------------- | --:
2019-10-21 | 1
2019-10-24 | 1
2019-10-25 | 2
2019-10-26 | 1
In MySQL 8.0, This can also be achieved with window functions:
select deliverydate first_deliverydate, count(*) cnt
from (
select deliverydate, row_number() over(partition by memberid order by deliverydate) rn
from orders
) t
where rn = 1
group by deliverydate
order by deliverydate
Demo on DB Fiddle

you have first to figure out when was the first delivery date:
SELECT firstdeliverydate,COUNT(DISTINCT memberid) FROM (
select memberid, min(deliverydate) as firstdeliverydate
from orders
WHERE
MemberId IN (SELECT DISTINCT memberid FROM orders WHERE deliverydate BETWEEN '2019-10-25' AND '2019-10-26')
AND NOT
MemberId In (SELECT DISTINCT memberid FROM orders WHERE deliverydate < '2019-10-25')
group by memberid)
t1
group by firstdeliverydate

Get the first order of each customer with NOT EXISTS and then GROUP BY deliverydate to count the distinct customers who placed their order:
select o.deliverydate, count(distinct o.memberid) counter
from orders o
where not exists (
select 1 from orders
where memberid = o.memberid and deliverydate < o.deliverydate
)
group by o.deliverydate
See the demo.
Results:
| deliverydate | counter |
| ------------------- | ------- |
| 2019-10-21 00:00:00 | 1 |
| 2019-10-24 00:00:00 | 1 |
| 2019-10-25 00:00:00 | 2 |
| 2019-10-26 00:00:00 | 1 |
But if you want results for all the dates in the table including those dates where there where no orders from new customers (so the counter will be 0):
select d.deliverydate, count(distinct o.memberid) counter
from (
select distinct deliverydate
from orders
) d left join orders o
on o.deliverydate = d.deliverydate and not exists (
select 1 from orders
where memberid = o.memberid and deliverydate < o.deliverydate
)
group by d.deliverydate

Related

How to join tables only with the latest record in SQL SERVER [duplicate]

This question already has answers here:
Join to only the "latest" record with t-sql
(7 answers)
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
Closed 4 months ago.
I want to list all customer with the latest phone number and most recent customer type
the phone number and type of customers are changing periodically so I want the latest record only without getting old values based on the lastestupdate column
Customer:
+------------+--------------------+------------+
|latestUpdate| CustID | AddID | TypeID |
+------------+--------+-----------+-------------
| 2020-03-01 | 1 | 1 | 1 |
| 2020-04-07 | 2 | 2 | 2 |
| 2020-06-13 | 3 | 3 | 3 |
| 2020-03-29 | 4 | 4 | 4 |
| 2020-02-06 | 5 | 5 | 5 |
+------------+--------+------------+----------+
CustomerAddress:
+------------+--------+-----------+
|latestUpdate| AddID | Mobile |
+------------+--------+-----------+
| 2020-03-01 | 1 | 66666 |
| 2020-04-07 | 1 | 55555 |
| 2020-06-13 | 2 | 99999 |
| 2020-03-29 | 3 | 11111 |
| 2020-02-06 | 3 | 22222 |
+------------+--------+-----------+
CustomerType:
+------------+--------+-----------+
|latestUpdate| TypeId | TypeName |
+------------+--------+-----------+
| 2020-03-01 | 1 | First |
| 2020-04-07 | 1 | Second |
| 2020-06-13 | 3 | Third |
| 2020-03-29 | 4 | Fourth |
| 2020-02-06 | 5 | Fifth |
+------------+--------+-----------+
When I tried to join I am always getting duplicated customerID not only the latest record
I want to Display Customer.CustID and CustomerType.TypeName and CustomerAddress.Mobile
You need to make sub-queries for most recent customer type and latest phone number like this:
SELECT *
FROM (
SELECT latestUpdate, CustID, AddID, TypeID,
ROW_NUMBER() OVER (PARTITION BY CustID ORDER BY latestUpdate DESC) AS RowNumber
FROM Customer
) AS c
INNER JOIN (
SELECT latestUpdate, AddID, Mobile,
ROW_NUMBER() OVER (PARTITION BY AddId ORDER BU ltestUpdate DESC) AS RowNumber
FROM CustomerAddress
) AS t
ON c.AddId = t.AddId
INNER JOIN CustomerType ct
ON ct.TypeId = c.TypeId
WHERE c.RowNumber = 1
AND t.RowNumber = 1
A simpler way than using row_number would be using cross apply together with top 1 in an ordered subquery:
select c.CustId, p.Mobile
from Customer c
cross apply (
select top 1 Mobile
from CustomerAddress a
where c.CustId = a.AddId
order by a.latestUpdate
) p
You need to use some subqueries :
SELECT *
FROM Customer AS C
LETF OUTER JOIN (SELECT *, ROW_NUMBER() OVER(PARTITION BY CustID ORDER BY LastestUpdate DESC) AS N
FROM CustomerAddress) AS A
ON C.CustID = A.CustID AND N = 1
LETF OUTER JOIN (SELECT *, ROW_NUMBER() OVER(PARTITION BY CustID ORDER BY LastestUpdate DESC) AS N
FROM CustomerType) AS T
ON C.CustID = T.CustID AND N = 1
If you have had used Temporal table which is an ISO SQL Standard feature for data history of table, you will always have the lastest rows inside the main table, old rows stays into history table and can be queried with a time point or date interval restriction.
This is it:
select * from (select *,RANK() OVER (
PARTITION BY b.AddID
ORDER BY b.latestUpdate DESC,
) as rank1
from
Customer a
left join
CustomerAddress b
on
a.AddID=b.AddID
left join
CustomerType c
on
v.TypeId =c.TypeId
) where rank1=1;
You should join the tables using the "APPLY" operator.
See: Link

SQL left join with latest record

I want to left join a table with the latest record only.
I have Customer1 table:
+--------+----------+
| CustID | CustName |
+--------+----------+
| 1 | ABC123 |
| 2 | 456XYZ |
| 3 | 5PQR3 |
| 4 | 789XYZ |
| 5 | 789A |
+--------+----------+
SalesInvoice table:
+------------+--------+-----------+
| InvDate | CustID | InvNumber |
+------------+--------+-----------+
| 2020-03-01 | 1 | IV236 |
| 2020-04-07 | 1 | IV644 |
| 2020-06-13 | 2 | IV869 |
| 2020-03-29 | 3 | IV436 |
| 2020-02-06 | 3 | IV126 |
+------------+--------+-----------+
And I want this required output:
+--------+------------+-----------+
| CustID | InvDate | InvNumber |
+--------+------------+-----------+
| 1 | 2020-04-07 | IV644 |
| 2 | 2020-06-13 | IV869 |
| 3 | 2020-03-29 | IV436 |
| 4 | | |
| 5 | | |
+--------+------------+-----------+
For quick and easy, below is the sample code.
drop table if exists #Customer1
create table #Customer1(CustID int, CustName varchar (100))
insert into #Customer1 values
(1,'ABC123'),
(2,'456XYZ'),
(3,'5PQR3'),
(4,'789XYZ'),
(5,'789A')
drop table if exists #SalesInvoice
create table #SalesInvoice(InvDate DATE, CustID INT, InvNumber varchar (100))
insert into #SalesInvoice values
('2020-03-01',1,'IV236'),
('2020-04-07',1,'IV644'),
('2020-06-13',2,'IV869'),
('2020-03-29',3,'IV436'),
('2020-02-06',3,'IV126')
I like using TOP 1 WITH TIES in this case:
SELECT TOP 1 WITH TIES c.CustID, i.InvDate, i.InvNumber
FROM #Customer1 c
LEFT JOIN #Invoices i ON c.CustID = i.CustID
ORDER BY ROW_NUMBER() OVER (PARTITION BY c.CustID ORDER BY i.InvDate DESC);
Demo
The top 1 trick here is to order by row number, assigning a sequence to each customer, with the sequence descending by invoice date. Then, this approach retains just the most recent invoice record for each customer.
I recommend outer apply:
select c.*, i.*
from #c c outer apply
(select top (1) i.*
from #invoices i
where i.custId = c.custId
order by i.invDate desc
) i;
outer apply implements a special type of join called a "lateral join". This is a very powerful construct. But when learning about them, you can think of a lateral join as a correlated subquery that can return more than one column and more than one row.
You can try ROW_NUMBER window function instead of lateral joins with this simple self-explaining T-SQL
SELECT c.CustID
, d.InvDate
, d.InvNumber
FROM #C c
LEFT JOIN (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY CustID ORDER BY InvDate DESC) AS RowNo
FROM #D
) d
ON c.CustID = d.CustID
AND d.RowNo = 1
Basically ROW_NUMBER is used to filter the "last" invoice in one table scan, instead of performing SELECT TOP 1 ... ORDER BY in the correlated query which has to be executed multiple times -- as much as the number of customers.

Retrieve the minimal create date with multiple rows

I have an issue with an SQL query that I am trying to write. I am trying to retrieve the row that has the minimal create_dt for each inst (see table) and amount (which isn't unique).
Unfortunately I can't use group by as the amount column isn't unique.
+--------------+--------+------+-------------+
| Company_Name | Amount | inst | Create Date |
+--------------+--------+------+-------------+
| Company A | 1000 | 4545 | 01/10/2018 |
| Company A | 400 | 4545 | 01/11/2018 |
| Company A | 200 | 4545 | 31/10/2018 |
| Company B | 2000 | 4893 | 01/10/2016 |
| Company B | 212 | 4893 | 04/10/2016 |
| Company B | 100 | 4893 | 10/10/2017 |
| Company B | 20 | 4893 | 04/10/2018 |
+--------------+--------+------+-------------+
In the above example I expect to see:
+--------------+--------+------+-------------+
| Company_Name | Amount | inst | Create Date |
+--------------+--------+------+-------------+
| Company A | 1000 | 4545 | 01/10/2018 |
| Company B | 2000 | 4893 | 01/10/2016 |
+--------------+--------+------+-------------+
Code:
SELECT
bill_company, bill_name, account_no
FROM
dbo.customer_information;
SELECT
balance_id, balance_id2, minus_balance,new_balance,
create_date, account_no
FROM
dbo.btr
SELECT
balance_id, balance_id2, expired_Date, amount, balance_type, account_no
FROM
dbo.btr_balance
SELECT
balance_ist, expired_date, account_no, balance_type
FROM
dbo.BALANCE_inst
Retrieve the minimal create data for a balance instance with the lowest balance for a balance inst.
(SELECT
bill_company,
bill_name,
account_no,
balance_ist,
amount,
MIN(create_date)
FROM
dbo.mtr btr
LEFT JOIN
btr_balance btrb ON btr.balance_id = btrb.balance_id
AND btr.balance_id2 = btrb.balance_id2
LEFT JOIN
balance_inst bali ON btr.account_no = bali.account_no
AND btrb.expired_date = bali.expired_date
GROUP BY
bill_company, bill_name, account_no,amount, balance_ist)
I have seen some solutions about using correlated query but can't see to get my head around it.
Common Table Expression (CTE) will help you.
;with cte as (
select *, row_number() over(partition by company_name order by create_date) rn
from dbo.myTable
)
select * from cte
where rn = 1;
use row_number() i assumed bill_company is your company name
select * from
( SELECT bill_company,
bill_name,
account_no,
balance_ist,
amount,
create_date,
row_number() over(partition by bill_company order by create_date) rn
FROM dbo.mtr btr left join btr_balance btrb
on btr.balance_id = btrb.balance_id and btr.balance_id2 = btrb.balance_id2
left join balance_inst bali
on btr.account_no = bali.account_no and btrb.expired_date = bali.expired_date
) t where t.rn=1

Doing a market basket analysis on the order details

I have a table that looks (abbreviated) like:
| order_id | item_id | amount | qty | date |
|---------- |--------- |-------- |----- |------------ |
| 1 | 1 | 10 | 1 | 10-10-2014 |
| 1 | 2 | 20 | 2 | 10-10-2014 |
| 2 | 1 | 10 | 1 | 10-12-2014 |
| 2 | 2 | 20 | 1 | 10-12-2014 |
| 2 | 3 | 45 | 1 | 10-12-2014 |
| 3 | 1 | 10 | 1 | 9-9-2014 |
| 3 | 3 | 45 | 1 | 9-9-2014 |
| 4 | 2 | 20 | 1 | 11-11-2014 |
I would like to run a query that would calculate the list of items
that most frequently occur together.
In this case the result would be:
|items|frequency|
|-----|---------|
|1,2, |2 |
|1,3 |1 |
|2,3 |1 |
|2 |1 |
Ideally, first presenting orders with more than one items, then presenting
the most frequently ordered single items.
Could anyone please provide an example for how to structure this SQL?
This query generate all of the requested output, in the cases where 2 items occur together. It doesn't include the last item of the requested output since a single value (2) technically doesn't occur together with anything... although you could easily add a UNION query to include values that happen alone.
This is written for PostgreSQL 9.3
create table orders(
order_id int,
item_id int,
amount int,
qty int,
date timestamp
);
INSERT INTO ORDERS VALUES(1,1,10,1,'10-10-2014');
INSERT INTO ORDERS VALUES(1,2,20,1,'10-10-2014');
INSERT INTO ORDERS VALUES(2,1,10,1,'10-12-2014');
INSERT INTO ORDERS VALUES(2,2,20,1,'10-12-2014');
INSERT INTO ORDERS VALUES(2,3,45,1,'10-12-2014');
INSERT INTO ORDERS VALUES(3,1,10,1,'9-9-2014');
INSERT INTO ORDERS VALUES(3,3,45,1,'9-9-2014');
INSERT INTO ORDERS VALUES(4,2,10,1,'11-11-2014');
with order_pairs as (
select (pg1.item_id, pg2.item_id) as items, pg1.date
from
(select distinct item_id, date
from orders) as pg1
join
(select distinct item_id, date
from orders) as pg2
ON
(
pg1.date = pg2.date AND
pg1.item_id != pg2.item_id AND
pg1.item_id < pg2.item_id
)
)
SELECT items, count(*) as frequency
FROM order_pairs
GROUP by items
ORDER by items;
output
items | frequency
-------+-----------
(1,2) | 2
(1,3) | 2
(2,3) | 1
(3 rows)
Market Basket Analysis with Join.
Join on order_id and compare if item_id < self.item_id. So for every item_id you get its associated items sold. And then group by items and count the number of rows for each combinations.
select items,count(*) as 'Freq' from
(select concat(x.item_id,',',y.item_id) as items from orders x
JOIN orders y ON x.order_id = y.order_id and
x.item_id != y.item_id and x.item_id < y.item_id) A
group by A.items order by A.items;

if more than 1 match, do not return 'unknown'

I composed a monster query. I'm certain that it can be optimized, and I would more than appreciate any comments/guidance on the query itself; however, I have a specific question:
The data I am returning is sometimes duplicated on multiple columns:
+-------+------+----------+------+-------+--------+----------+-------+------+
| first | last | deaID | cert | count | npi | clientid | month | year |
+-------+------+----------+------+-------+--------+----------+-------+------+
| Alex | Jue | UNKNOWN | MD | 11 | 123123 | 102889 | 7 | 2012 |
| Alex | Jue | BJ123123 | MD | 11 | 123123 | 102889 | 7 | 2012 |
+-------+------+----------+------+-------+--------+----------+-------+------+
as you can see all of the fields are equal except for deaID
in this case, I would like to only return:
+------+-----+----------+----+----+--------+--------+---+------+
| | | | | | | | | |
+------+-----+----------+----+----+--------+--------+---+------+
| Alex | Jue | BJ123123 | MD | 11 | 123123 | 102889 | 7 | 2012 |
+------+-----+----------+----+----+--------+--------+---+------+
however, if there are no duplicates:
+-------+------+---------+------+-------+--------+----------+-------+------+
| first | last | deaID | cert | count | npi | clientid | month | year |
+-------+------+---------+------+-------+--------+----------+-------+------+
| Alex | Jue | UNKNOWN | MD | 11 | 123123 | 102889 | 7 | 2012 |
+-------+------+---------+------+-------+--------+----------+-------+------+
then i would like to keep it!
summary
if there are duplicates remove all records with 'deaID=unknown'; however, if there is only 1 match then return that match
question
how do i return unknown records IFF there is 1 match?
here is the monster query in case anybody is interested :)
with ctebiggie as (
select distinct
p.[IMS_PRESCRIBER_ID],
p.PHYSICIAN_NPI as MLISNPI,
a.CLIENT_ID,
p.MLIS_FIRSTNAME,
p.MLIS_LASTNAME,
p_address.IMS_DEA_NBR,
p.IMS_PROFESSIONAL_ID_NBR,
p.IMS_PROFESSIONAL_ID_NBR_src,
p.IMS_CERTIFICATION_CODE,
datepart(mm,a.RECEIVED_DATE) as [Month],
datepart(yyyy,a.RECEIVED_DATE) as [Year]
from
MILLENNIUM_DW_dev..D_PHYSICIAN p
left outer join
MILLENNIUM_DW_dev..F_ACCESSION_DAILY a
on a.REQUESTOR_NPI=p.PHYSICIAN_NPI
left outer join MILLENNIUM_DW_dev..D_PHYSICIAN_ADDRESS p_address
on p.PHYSICIAN_NPI=p_address.PHYSICIAN_NPI
where
a.RECEIVED_DATE is not null
--and p.IMS_PRESCRIBER_ID is not null
--and p_address.IMS_DEA_NBR !='UNKNOWN'
and p.REC_ACTIVE_FLG=1
and p_address.REC_ACTIVE_FLG=1
and DATEPART(yyyy,received_date)=2012
and DATEPART(mm,received_date)=7
group by
p.[IMS_PRESCRIBER_ID],
p.PHYSICIAN_NPI,
p.IMS_PROFESSIONAL_ID_NBR,
p.MLIS_FIRSTNAME,
p.MLIS_LASTNAME,
p_address.IMS_DEA_NBR,
p.IMS_PROFESSIONAL_ID_NBR,
p.IMS_PROFESSIONAL_ID_NBR_src,
p.IMS_CERTIFICATION_CODE,
datepart(mm,a.RECEIVED_DATE),
datepart(yyyy,a.RECEIVED_DATE),
a.CLIENT_ID
)
,
ctecount as
(select
COUNT (Distinct f.ACCESSION_ID) [count],
f.REQUESTOR_NPI,f.CLIENT_ID,
datepart(mm,f.RECEIVED_DATE) mm,
datepart(yyyy,f.RECEIVED_DATE)yyyy
from MILLENNIUM_DW_dev..F_ACCESSION_DAILY f
where
f.CLIENT_ID not in (select * from SalesDWH..TestPractices)
and DATEPART(yyyy,f.received_date)=2012
and DATEPART(mm,f.received_date)=7
group by f.REQUESTOR_NPI,
f.CLIENT_ID,
datepart(mm,f.RECEIVED_DATE),
datepart(yyyy,f.RECEIVED_DATE)
)
select ctebiggie.*,c.* from
ctebiggie
full outer join
ctecount c
on c.REQUESTOR_NPI=ctebiggie.MLISNPI
and c.mm=ctebiggie.[Month]
and c.yyyy=ctebiggie.[Year]
and c.CLIENT_ID=ctebiggie.CLIENT_ID
Assuming you have the base query, I will assign row_number and count by partition function over this resultset. Then on the outer select, if count is 1 then unknown is selected, else it is not selected.
SELECT first,
last,
deaID,
cert,
count,
npi,
clientid,
month,
year
FROM (
SELECT first,
last,
deaID,
cert,
count,
npi,
clientid,
month,
year,
ROW_NUMBER() OVER (PARTITION BY
first,last,cert,count,npi,clientid,month,year
ORDER BY CASE WHEN deaID = 'Unkown' THEN 0 ELSE 1 END,
deaID) AS RowNumberInGroup,
COUNT() OVER (PARTITION BY first,last,cert,count,npi,clientid,month,year)
AS CountPerGroup,
SUM(CASE WHEN deaID = 'Unkown' THEN 1 ELSE 0 END)
OVER (PARTITION BY first,last,cert,count,npi,clientid,month,year)
AS UnknownCountPerGroup
FROM BaseQuery
) T
WHERE (T.CountPerGroup = T.UnknownCountPerGroup AND T.RowNumberInGroup = 1) OR T.RowNumberInGroup > T.UnknownCountPerGroup
see this helps or not
select distinct main.col1,main.col2 ,
isnull(( select col3 from table1 where table1.col1=main.col1
and table1.col2=main.col2 and col3 <>'UNKNOWN'),'UNKNOWN')
from table1 main
Sample in Sql fiddle
or fair version of yours will be
SELECT distinct first,
last,
cert,
count,
npi,
clientid,
month,
year,
isnull(
select top 1 dealid from table1 intable where
intable.first=maintable.first and
intable.last=maintable.last and
intable.cert=maintable.cert and
intable.npi=maintable.npi and
intable.clientid=outtable.clientid and
intable.month=outtable.month and
intable.year=outtable.year
where dealid<>'UNKNOWN'),'UNKNOWN') as dealId
FROM table1 maintable