SQL: How do I count the number of clients that have already bought the same product? - sql

I have a table like the one below. It is a record of daily featured products and the customers that purchased them (similar to a daily deal site). A given client can only purchase a product one time per feature, but they may purchase the same product if it is featured multiple times.
FeatureID | ClientID | FeatureDate | ProductID
1 1002 2011-05-01 500
1 2333 2011-05-01 500
1 4458 2011-05-01 500
2 8888 2011-05-10 700
2 2333 2011-05-10 700
2 1111 2011-05-10 700
3 1002 2011-05-20 500
3 4444 2011-05-20 500
4 4444 2011-05-30 500
4 2333 2011-05-30 500
4 1002 2011-05-30 500
I want to count by FeatureID the number of clients that purchased FeatureID X AND who purchased the same productID during a previous feature.
For the table above the expected result would be:
FeatureID | CountofReturningClients
1 0
2 0
3 1
4 3
Ideally I would like to do this with SQL, but am also open to doing some manipulation in Excel/PowerPivot. Thanks!!

If you join your table to itself, you can find the data you're looking for. Be careful, because this query can take a long time if the table has a lot of data and is not indexed well.
SELECT t_current.FEATUREID, COUNT(DISTINCT t_prior.CLIENTID)
FROM table_name t_current
LEFT JOIN table_name t_prior
ON t_current.FEATUREDATE > t_prior.FEATUREDATE
AND t_current.CLIENTID = t_prior.CLIENTID
AND t_current.PRODUCTID = t_prior.PRODUCTID
GROUP BY t_current.FEATUREID

"Per feature, count the clients who match for any earlier Features with the same product"
SELECT
Curr.FeatureID
COUNT(DISTINCT Prev.ClientID) AS CountofReturningClients --edit thanks to feedback
FROM
MyTable Curr
LEFT JOIN
MyTable Prev WHERE Curr.FeatureID > Prev.FeatureID
AND Curr.ClientID = Prev.ClientID
AND Curr.ProductID = Prev.ProductID
GROUP BY
Curr.FeatureID

Assumptions: You have a table called Features that is:
FeatureID, FeatureDate, ProductID
If not then you could always create one on the fly with a temporary table, cte or view.
Then:
SELECT
FeatureID
, (
SELECT COUNT(DISTINCT ClientID) FROM Purchases WHERE Purchases.FeatureDate < Feature.FeatureDate AND Feature.ProductID = Purchases.ProductID
) as CountOfReturningClients
FROM Features
ORDER BY FeatureID

New to this, but wouldn't the following work?
SELECT FeatureID, (CASE WHEN COUNT(clientid) > 1 THEN COUNT(clientid) ELSE 0 END)
FROM table
GROUP BY featureID

Related

How to Select ID's in SQL (Databricks) in which at least 2 items from a list are present

I'm working with patient-level data in Azure Databricks and I'm trying to build out a cohort of patients that have at least 2 diagnoses from a list of specific diagnosis codes. This is essentially what the table looks like:
CLAIM_ID | PTNT_ID | ICD_CD | DATE
---------+---------+--------+------------
1 101 2500 01_25_2020
2 101 3850 03_13_2018
3 222 2500 10_26_2018
4 222 8888 11_30_2018
5 222 9155 04_01_2019
6 871 2500 02_17_2020
7 871 3200 09_09_2019
The list of ICD_CD codes of interest is something like [2500, 3850, 8888]. In this case, I would want to return TOTAL UNIQUE PTNT_ID = 2. These would be PTNT_ID = (101, 222) as these are the only two patients that have at least 2 ICD_CD codes of interest.
When I use something like this, I'm able to return all of the relevant PTNT_ID values, but I'm not able to get the total count of these PTNT_ID:
select mc.PTNT_ID
from MEDICAL_CLAIMS mc
where mc.PTNT_ID in ( # list of ICD_CD of interest
)
group by mc.PTNT_ID
having count(distinct mc.PTNT) >= 2
When I try to add a COUNT statement in, it returns an error
Just select from the query:
select count(*)
from
(
select mc.PTNT_ID
from MEDICAL_CLAIMS mc
where mc.PTNT_ID in ( # list of ICD_CD of interest )
group by mc.PTNT_ID
having count(distinct mc.PTNT) >= 2
) ptnts;

Access sql Moving Average of Top N With 2 criterias

I have been searching the forum and found a single post that is a little smilair to my problem here: Calculate average for Top n combined with SQL Group By.
My situation is:
I have a table tblWEIGHT that contains: ID, Date, idPONR, Weight
I have a second table tblSALES that contains: ID, Date, Sales, idPONR
I have a third table tblPONR that contains: ID, PONR, idProduct
And a fouth table tblPRODUCT that contais: ID, Product
The linking:
tblWEIGHT.idPONR = tblPONR.ID
tblSALES.idPONR = tblPONR.ID
tblPONR.idProduct = tblPRODUCT.ID
The maintable of my query is tblSALES. I want to all my sales listed, with the moving average of the top5
weights of the PRODUCT where the date of the weight is less than the sales date, and the product is the same as the sold product. Its IMPORTANT that the result isn't grouped by the date. I need all the records of tblSALES.
i have gotten as far as to get the top 1 weight, but im not able to get the moving average instread.
The query that gest the top 1 is the following, and i am guessing that the query i need is going to look a lot like it.
SELECT tblSALES.ID, tblSALES.Dato, tblPONR.idPRODUCT,
(
SELECT top 1 Weight FROM tblWEIGHT INNER JOIN tblPONR ON tblWeight.idPONR = tblPONR.ID
WHERE tblPONR.idPRODUCT = idPRODUCT AND
SALES.Date > tblWEIGHT.Date
ORDER BY tblWEIGHT.Date desc
) AS LatestWeight
FROM tblSALES INNER JOIN VtblPONR ON tblSALES.idPONR = tblPONR.ID
this is not my exact query since im danish and i wouldnt make sense. I know im not supposed to use Date as a fieldname.
i imagine the filan query would be something like:
SELECT tblSALES.ID..... avg(SELECT TOP 5 weight .........)
but doing this i keep getting error at max 1 record can be returned by this subquery
Final Question.
How do i make a query that creates a moving average of the top 5 weights of my sold product, where the date of the weight is earlier than the date i sold the product?
EDIT Sampledata:
DATEFORMAT: dd/mm/yyyy
tblWEIGHT
ID Date idPONR Weight
1 01-01-2020 1 100
2 02-01-2020 2 200
3 03-01-2020 3 200
4 04-01-2020 3 400
5 05-01-2020 2 250
6 06-01-2020 1 150
7 07-01-2020 2 200
tblSALES
ID Date Sales(amt) idPONR
1 05-01-2020 30 1
2 06-01-2020 15 2
3 10-01-2020 20 3
tblPONR
ID PONR(production Number) idProduct
1 2521 1
2 1548 1
3 5484 2
tblPRODUCT
ID Product
1 Bricks
2 Tiles
Desired outcome read comments for AvgWeight
tblSALES.ID tblSALES.Date tblSales.Sales(amt) AvgWeigt
1 05-01-2020 30 123 -->avg(top 5 newest weight of both idPONR 1 And 2 because they are the same product, and where tblWeight.Date<05-01-2020)
2 06-01-2020 15 123 -->avg(top 5 newest weight of both idPONR 1 And 2 because they are the same product, and where tblWeight.Date<06-01-2020)
3 10-01-2020 20 123 -->avg(top 5 newest weight of idPONR 3 since thats the only idPONR with that product, and where tblWeight.Date<10-01-2020)
Consider:
Query1
SELECT tblWeight.ID AS WeightID, tblWeight.Date AS WtDate,
tblWeight.idPONR, tblPONR.PONR, tblPONR.idProduct, tblWeight.Weight, tblSales.SalesAmt,
tblSales.ID AS SalesID, tblSales.Date AS SalesDate
FROM (tblPONR INNER JOIN tblWeight ON tblPONR.ID = tblWeight.idPONR)
INNER JOIN tblSales ON tblPONR.ID = tblSales.idPONR;
Query2
SELECT * FROM Query1 WHERE WeightID IN (
SELECT TOP 5 WeightID FROM Query1 AS Dupe WHERE Dupe.idProduct = Query1.idProduct
AND Dupe.WtDate<Query1.SalesDate ORDER BY Dupe.WtDate);
Query3
SELECT Query2.SalesID, Query2.SalesDate, Query2.SalesAmt,
First(DAvg("Weight","Query2","idProduct=" & [idProduct] & " AND WtDate<#" & [SalesDate] & "#")) AS AvgWt
FROM Query2
GROUP BY Query2.SalesID, Query2.SalesDate, Query2.SalesAmt;

Aggregate payments per year per customer per type

Please consider the following payment data:
customerID paymentID pamentType paymentDate paymentAmount
---------------------------------------------------------------------
1 1 A 2015-11-28 500
1 2 A 2015-11-29 -150
1 3 B 2016-03-07 300
2 4 A 2015-03-03 200
2 5 B 2016-05-25 -100
2 6 C 2016-06-24 700
1 7 B 2015-09-22 110
2 8 B 2016-01-03 400
I need to tally per year, per customer, the sum of the diverse payment types (A = invoice, B = credit note, etc), as follows:
year customerID paymentType paymentSum
-----------------------------------------------
2015 1 A 350 : paymentID 1 + 2
2015 1 B 110 : paymentID 7
2015 1 C 0
2015 2 A 200 : paymentID 4
2015 2 B 0
2015 2 C 0
2016 1 A 0
2016 1 B 300 : paymentID 3
2016 1 C 0
2016 2 A 0
2016 2 B 300 : paymentID 5 + 8
2016 2 C 700 : paymentId 6
It is important that there are values for every category (so for 2015, customer 1 has 0 payment value for type C, but still it is good to see this).
In reality, there are over 10 payment types and about 30 customers. The total date range is 10 years.
Is this possible to do in only SQL, and if so could somebody show me how? If possible by using relatively easy queries so that I can learn from it, for instance by storing intermediary result into a #temptable.
Any help is greatly appreciated!
a simple GROUP BY with SUM() on the paymentAmount will gives you what you wanted
select year = datepart(year, paymentDate),
customerID,
paymentType,
paymentSum = sum(paymentAmount)
from payment_data
group by datepart(year, paymentDate), customerID, paymentType
This is a simple query that generates the required 0s. Note that it may not be the most efficient way to generate this result set. If you already have lookup tables for customers or payment types, it would be preferable to use those rather than the CTEs1 I use here:
declare #t table (customerID int,paymentID int,paymentType char(1),paymentDate date,
paymentAmount int)
insert into #t(customerID,paymentID,paymentType,paymentDate,paymentAmount) values
(1,1,'A','20151128', 500),
(1,2,'A','20151129',-150),
(1,3,'B','20160307', 300),
(2,4,'A','20150303', 200),
(2,5,'B','20160525',-100),
(2,6,'C','20160624', 700),
(1,7,'B','20150922', 110),
(2,8,'B','20160103', 400)
;With Customers as (
select DISTINCT customerID from #t
), PaymentTypes as (
select DISTINCT paymentType from #t
), Years as (
select DISTINCT DATEPART(year,paymentDate) as Yr from #t
), Matrix as (
select
customerID,
paymentType,
Yr
from
Customers
cross join
PaymentTypes
cross join
Years
)
select
m.customerID,
m.paymentType,
m.Yr,
COALESCE(SUM(paymentAmount),0) as Total
from
Matrix m
left join
#t t
on
m.customerID = t.customerID and
m.paymentType = t.paymentType and
m.Yr = DATEPART(year,t.paymentDate)
group by
m.customerID,
m.paymentType,
m.Yr
Result:
customerID paymentType Yr Total
----------- ----------- ----------- -----------
1 A 2015 350
1 A 2016 0
1 B 2015 110
1 B 2016 300
1 C 2015 0
1 C 2016 0
2 A 2015 200
2 A 2016 0
2 B 2015 0
2 B 2016 300
2 C 2015 0
2 C 2016 700
(We may also want to play games with a numbers table and/or generate actual start and end dates for years if the date processing above needs to be able to use an index)
Note also how similar the top of my script is to the sample data in your question - except it's actual code that generates the sample data. You may wish to consider presenting sample code in such a way in the future since it simplifies the process of actually being able to test scripts in answers.
1CTEs - Common Table Expressions. They may be thought of as conceptually similar to temp tables - except we don't actually (necessarily) materialize the results. They also are incorporated into the single query that follows them and the whole query is optimized as a whole.
Your suggestion to use temp tables means that you'd be breaking this into multiple separate queries that then necessarily force SQL to perform the task in an order that we have selected rather than letting the optimizer choose the best approach for the above single query.

Concatenating data from one row into the results from another

I have a SQL Server database of orders I'm struggling with. For a normal order a single table provides the following results:
Orders:
ID Customer Shipdate Order ID
-----------------------------------------------------------------
1 Tom 2015-01-01 100
2 Bob 2014-03-20 200
At some point they needed orders that were placed by more than one customer. So they created a row for each customer and split the record over multiple rows.
Orders:
ID Customer Shipdate Order ID
-----------------------------------------------------------------
1 Tom 2015-01-01 100
2 Bob 2014-03-20 200
3 John
4 Dan
5 2014-05-10 300
So there is another table I can join on to make sense of this which relates the three rows which are actually one order.
Joint.Orders:
ID Related ID
-----------------------------------------------------------------
5 3
5 4
I'm a little new to SQL and while I can join on the other table and filter to only get the data relating to Order ID 300, but what I'd really like is to concatenate the customers, but after searching for a while I can't see how to do this. What'd I'd really like to achieve is this as an output:
ID Customer Shipdate Order ID
----------------------------------------------------------------
1 Tom 2015-01-01 100
2 Bob 2014-03-20 200
5 John, Dan 2014-05-10 300
You should consider changing the schema first. The below query might help you get a feel of how it can be done with your current design.
Select * From Orders Where IsNull(Customer, '') <> ''
Union All
Select ID,
Customer = (Select Customer + ',' From Orders OI Where OI.ID (Select RelatedID from JointOrders JO Where JO.ID = O.ID)
,ShipDate, OrderID
From Orders O Where IsNull(O.Customer, '') = ''

Counting presence of discount ranges in SQL

I have a simple orders table for a product being sold that has a column with the discount per order. For example:
Order Number DiscountPrcnt
1234 0
1235 10
1236 41
What I would like to do is create an output where I can join the order table to a customer table and group the discounts into ranges by email, as follows:
Email_Address 0-20 20-50 50-100
joe#abc.com Yes Yes
tom#abc.com Yes Yes
So the idea is to determine if each customer (here designated by email) has ever received a discount in the specified range, and if not, it should return NULL for the range.
A simplified version of the table structure is:
Customer Table:
CustID Email
123 joe#abc.com
234 tom#abc.com
456 joe#abc.com
So emails can repeat across customers.
Orders Table:
CustID OrderID Amount DiscPrcnt
123 1234 50.00 0
234 1235 75.00 10
456 1236 20.00 41
select c.email, COUNT(o1.order_number), COUNT(o2.order_number), COUNT(o3.order_number)
from customer c, order o1, order o2, order o3
where c.CustID = o1.CustId
and c.CustID = o2.CustID
and c.CustID = o3.CustID
and o1.DiscountPrcnt > 0
and o1.DiscountPrcnt <= 20
and o2.DiscountPrcnt > 2
and o2.DiscountPrcnt <= 50
and o3.DiscountPrcnt > 50
and o3.DiscountPrcnt <= 100
group by c.email
You'll not get yes as expected but the number of time the user benefits the discount. 0 if none.