SQL query of sales by customer over multiple years for top 50 customers in one year - sql

I have a query in SQL server that successfully returns the top 50 customers for a given year by sales. I want to expand it to return their sales for the additional years when they may or may not be in the top 50.
SELECT TOP 50 CU.CustomerName, SUM(ART.SalesAnalysis) AS '2011'
FROM ARTransaction AS ART, Customer AS CU
WHERE ART.CustomerN = CU.CustomerN AND ART.PostingDate BETWEEN '2010-12-31' AND '2012-01-01'
GROUP By CU.CustomerName
ORDER BY SUM(ART.SalesAnalysis) DESC
I tried adding nested SELECT statements but they return strange results and I'm not sure why (might not ever work, but the results have me flabbergasted anyway). When included the values of every row is changed and customers are duplicated.
(SELECT SUM(ART.SalesAnalysis)
WHERE ART.PostingDate BETWEEN '2011-12-31' AND '2013-01-01') AS '2012'
I tried to put a TOP statement in a nested SELECT in HAVING but that tells me
"Msg 8114, Level 16, State 5, Line 1
Error converting data type varchar to numeric."
SELECT CU.CustomerName, SUM(ART.SalesAnalysis) AS '2011'
FROM ARTransaction AS ART
JOIN Customer AS CU ON ART.CustomerN = CU.CustomerN
GROUP BY CU.CustomerNAme
HAVING CU.CustomerNAme IN
(SELECT TOP 50 CU.CustomerName
FROM ARTransaction
JOIN Customer ON ARTransaction.CustomerN = Customer.CustomerN
WHERE ARTransaction.SalesAnalysis BETWEEN '2010-12-31' AND '2012-01-01'
GROUP BY Customer.CustomerN
ORDER BY SUM(ART.SalesAnalysis) DESC)

If I understand correctly, you are looking for the top 50 sales for customers based on 2011 data - and want to see all years data for those top 50 from 2011, regardless of those customers being in the top 50 for other years.
Try this, it might need to be tweaked a bit as I don't know the schema, but if I understand the question correctly, this should do the trick.
WITH Top50 AS (
SELECT TOP 50
CU.CustomerN
,SUM(ART.SalesAnalysis) AS SalesTotal
FROM
ARTransaction art
INNER JOIN
Customer cu
ON cu.CustomerN = art.CustomerN
WHERE
ART.PostingDate BETWEEN CAST('2011-01-01' AS DATETIME)
AND CAST('2011-12-31' AS DATETIME)
GROUP BY
CU.CustomerN
ORDER BY
SUM(ART.SalesAnalysis) DESC)
SELECT
c.CustomerName
,SUM(a.SalesAnalysis) AS TotalSales
,YEAR(a.PostingDate) AS PostingDateYear
FROM
ARTransaction a
INNER JOIN
Customer c
ON c.CustomerN = a.CustomerN
INNER JOIN
Top50 t
ON t.CustomerN = a.CustomerN
GROUP BY
c.CustomerName
,YEAR(a.PostingDate)
ORDER BY
PostingDateYear

you could use something like below... you will be looking at the all the data for all the years you want to look at and then just getting the top 50 for the 2011 year
SELECT TOP 50
CU.CustomerName,
SUM(case when year(ART.PostingDate) = 2011 -- or you could use case when ART.PostingDate BETWEEN '2011-01-01' AND '2011-12-31'
then ART.SalesAnalysis
else 0 end) AS 2011,
SUM(case when year(ART.PostingDate) = 2012
then ART.SalesAnalysis
else 0 end) AS 2012
FROM
ARTransaction ART,
inner join Customer CU
on ART.CustomerN = CU.CustomerN
WHERE ART.PostingDate BETWEEN '2011-01-01 AND '2012-12-31'
GROUP By CU.CustomerName
ORDER BY
SUM(case when year(ART.PostingDate) = 2011
then ART.SalesAnalysis
else 0 end) DESC

Related

how to return total sold p / year for several years in columnar format?

The SQL below give me the columns for Account, Name and the Total Sale for the year of 2015.But how, if possible, can I add another column for the previous year of 2014 ?
select a.AcctNo, b.Name, Sum(a.TotSold) as [ Total Sold ]
from Orders as A
Join Accounts as b on a.AcctNo = b.AcctNo
where (a.PurchaseDate between '1/1/2015' and '12/31/2015' )
Group by a.AcctNo, b.Name
You can use conditional aggregation:
select o.AcctNo, a.Name,
Sum(case when year(PurchaseDate) = 2015 then TotSold else 0 end) as tot_2015,
Sum(case when year(PurchaseDate) = 2014 then TotSold else 0 end) as tot_2014
from Orders o Join
Accounts a
on a.AcctNo = o.AcctNo
where PurchaseDate >= '2014-01-01'
PurchaseDate < '2016-01-01'
Group by o.AcctNo, a.Name ;
Notes:
Use ISO standard formats for date constants.
When using dates, try not to use between. It doesn't work as expected when there is a time component. The above inequalities work regardless of a time component.
Use table aliases that are abbreviations of the table name. That makes the query easier to follow.

Aggregating a sub query within query

I am currently working on aggregating the sum qty of "OUT" and "OUT+IN".
Current query is the following:
Select
a.Date
,a.DepartmentID
from
(Select
dris.Date
,dris.RentalItemKey
,dris.WarehouseKey
,ISNULL((Select TOP 1 dris.Date where OutQty=1 order by Date DESC),(Select ri.ReceiveDate from RentalItem ri where ri.RentalItemKey=dris.RentalItemKey)) as LastOutDate
,(Select d.DepartmentKey from Department d where d.Department=i.Department)as DepartmentID
, (CASE WHEN OutQty=1 OR (RepairQty=1 AND RentedQty=1) THEN 'IN' ELSE 'OUT' END) as Status
from DailyRentalItemStatus dris
inner join Inventory i on i.InventoryKey=dris.InventoryKey
where dris.Date='2014-08-02'
and i.ICode='3223700'
and i.Classification IN ('ITEM', 'ACCESSORY')
and i.AvailFor='RENT'
and i.AvailFrom='WAREHOUSE'
and dris.Warehouse='TORONTO')a
and I would like the result to be the following:
Date WarehouseID DepartmentID ICode Owned NotRedundant Out
2014-08-02 001T A00G 3223700 30 30 19
Where Owned is is The items with status as "OUT+IN", out is "OUT" and Not Redundant as where the lastout date is within the last 2 years from the date.
Help would be greatly appreciated.
I think this is close to what you're looking for. Your Not Redundant description, is hard to understand. Which dates are you comparing. The same trick for OUT may be used for that though.
My query also assumes that you always have a department connecting to the inventory table and that there's always a rentalitem.receivedate.
;WITH LastOut as
(Select Max(Date) as LastOutDate, rentalItemKey
from DailyRentalItemStatus
WHERE OutQty=1
)
Select
dris.Date
,dris.WarehouseKey as WarehouseID
,d.DepartmentKey as DepartmentID
, i.Icode
--,ISNULL((Select TOP 1 dris.Date where OutQty=1 order by Date DESC),(Select ri.ReceiveDate from RentalItem ri where ri.RentalItemKey=dris.RentalItemKey)) as LastOutDate
, Count(1) as Owned
, Sum(CASE WHEN NOT (OutQty=1 OR (RepairQty=1 AND RentedQty=1)) THEN 1 ELSE 0 END) as OUT
, Sum(CASE WHEN DateAdd(yy, 2,dris.[date]) >= ISNULL(lastout.lastoutdate, ri.ReceiveDate) then 1 else 0 end) as NonRedundent
from DailyRentalItemStatus dris
inner join Inventory i on i.InventoryKey=dris.InventoryKey
INNER JOIN Department d ON d.Department=i.Department
INNER JOIN RentalItem ri ON ri.RentalItemKey=dris.RentalItemKey
LEFT OUTER JOIN LastOUT ON LastOut.rentalItemKey=dris.RentalItemKey
where dris.Date='2014-08-02'
and i.ICode='3223700'
and i.Classification IN ('ITEM', 'ACCESSORY')
and i.AvailFor='RENT'
and i.AvailFrom='WAREHOUSE'
and dris.Warehouse='TORONTO'
Group BY dris.Date, d.DepartmentKey, Dris.WarehouseKey , i.icode

How to return the most ordered item for each month

I am trying to return the most ordered product per month, of the year 2007. I would like to see the name of the product, how many of them where ordered that month, and the month. I am using the AdventureWorks2012 database. I have tried a few different ways but each time multiple product orders are returned for the same month, instead of the one product that had the most order quantity that month. Sorry if this is not clear. I am trying to test myself so I make up my own questions and try to answer them. If anyone knows a site that have questions and answers like this so I can verify that would be super helpful! Thanks for any help. Here is the farthest I have been able to get with the query.
WITH Ord2007Sum
AS (SELECT sum(od.orderqty) AS sorder,
od.productid,
oh.orderdate,
od.SalesOrderID
FROM Sales.SalesOrderDetail AS od
INNER JOIN
sales.SalesOrderHeader AS oh
ON od.SalesOrderID = oh.SalesOrderID
WHERE year(oh.OrderDate) = 2007
GROUP BY ProductID, oh.OrderDate, od.SalesOrderID)
SELECT max(sorder),
s.productid,
month(h.orderdate) AS morder --, s.salesorderid
FROM Ord2007Sum AS s
INNER JOIN
sales.SalesOrderheader AS h
ON s.OrderDate = h.OrderDate
GROUP BY s.ProductID, month(h.orderdate)
ORDER BY morder;
Make a CTE that groups our products by month and creates a sum
;WITH OrderRows AS
(
SELECT
od.ProductId,
MONTH(oh.OrderDate) SalesMonth,
SUM(od.orderqty) OVER (PARTITION BY od.ProductId, MONTH(oh.OrderDate) ORDER BY oh.OrderDate) ProdMonthSum
FROM SalesOrderDetail AS od
INNER JOIN SalesOrderHeader AS oh
ON od.SalesOrderID = oh.SalesOrderID
WHERE year(oh.OrderDate) = 2007
),
Make a simple numbers table to break out each month of the year
Months AS
(
SELECT 1 AS MonthNum UNION SELECT 2 UNION SELECT 3 UNION SELECT 4
UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8
UNION SELECT 9 UNION SELECT 10 UNION SELECT 11 UNION SELECT 12
)
We query our months table against the data and select the top product for each month based on the sum
SELECT
m.MonthNum,
d.ProductID,
d.ProdMonthSum
FROM Months m
OUTER APPLY
(
SELECT TOP 1 r.ProductID, r.ProdMonthSum
FROM OrderRows r
WHERE r.SalesMonth = m.MonthNum
ORDER BY ProdMonthSum DESC
) d
Your group by statement should not include oh.OrderDate, od.SalesOrderID because this will aggregate your data to the incorrect level. You want the ProductID that was most commonly sold per month so the group by conditions become ProductID, datepart(mm,oh.OrderDate). As Andrew suggested the Row_Number function is useful in this case as it lets you create a key that is ordered by month and sorder and which resets each month. Finally in the outer query limits the results to the first instance (which is the highest quantity)for each month.
WITH Ord2007Sum
AS(
SELECT sum(od.orderqty) AS sorder,
od.productid,
datepart(mm,oh.OrderDate) AS 'Month'
row_number() over (partition by datepart(mm,oh.OrderDate)
Order by datepart(mm,oh.OrderDate)desc, sorder desc) row
FROM Sales.SalesOrderDetail AS od
INNER JOIN
sales.SalesOrderHeader AS oh
ON od.SalesOrderID = oh.SalesOrderID
WHERE datepart(yyyy,oh.OrderDate) = 2007
GROUP BY ProductID, datepart(mm,oh.OrderDate)
)
SELECT productid,
sorder,
[month]
FROM Ord2007Sum
WHERE row =1

Repeat Customers Each Year (Retention)

I've been working on this and I don't think I'm doing it right. |D
Our database doesn't keep track of how many customers we retain so we looked for an alternate method. It's outlined in this article. It suggests you have this table to fill in:
Year Number of Customers Number of customers Retained in 2009 Percent (%) Retained in 2009 Number of customers Retained in 2010 Percent (%) Retained in 2010 ....
2008
2009
2010
2011
2012
Total
The table would go out to 2012 in the headers. I'm just saving space.
It tells you to find the total number of customers you had in your starting year. To do this, I used this query since our starting year is 2008:
select YEAR(OrderDate) as 'Year', COUNT(distinct(billemail)) as Customers
from dbo.tblOrder
where OrderDate >= '2008-01-01' and OrderDate <= '2008-12-31'
group by YEAR(OrderDate)
At the moment we just differentiate our customers by email address.
Then you have to search for the same names of customers who purchased again in later years (ours are 2009, 10, 11, and 12).
I came up with this. It should find people who purchased in both 2008 and 2009.
SELECT YEAR(OrderDate) as 'Year',COUNT(distinct(billemail)) as Customers
FROM dbo.tblOrder o with (nolock)
WHERE o.BillEmail IN (SELECT DISTINCT o1.BillEmail
FROM dbo.tblOrder o1 with (nolock)
WHERE o1.OrderDate BETWEEN '2008-1-1' AND '2009-1-1')
AND o.BillEmail IN (SELECT DISTINCT o2.BillEmail
FROM dbo.tblOrder o2 with (nolock)
WHERE o2.OrderDate BETWEEN '2009-1-1' AND '2010-1-1')
--AND o.OrderDate BETWEEN '2008-1-1' AND '2013-1-1'
AND o.BillEmail NOT LIKE '%#halloweencostumes.com'
AND o.BillEmail NOT LIKE ''
GROUP BY YEAR(OrderDate)
So I'm just finding the customers who purchased in both those years. And then I'm doing an independent query to find those who purchased in 2008 and 2010, then 08 and 11, and then 08 and 12. This one finds 2008 and 2010 purchasers:
SELECT YEAR(OrderDate) as 'Year',COUNT(distinct(billemail)) as Customers
FROM dbo.tblOrder o with (nolock)
WHERE o.BillEmail IN (SELECT DISTINCT o1.BillEmail
FROM dbo.tblOrder o1 with (nolock)
WHERE o1.OrderDate BETWEEN '2008-1-1' AND '2009-1-1')
AND o.BillEmail IN (SELECT DISTINCT o2.BillEmail
FROM dbo.tblOrder o2 with (nolock)
WHERE o2.OrderDate BETWEEN '2010-1-1' AND '2011-1-1')
--AND o.OrderDate BETWEEN '2008-1-1' AND '2013-1-1'
AND o.BillEmail NOT LIKE '%#halloweencostumes.com'
AND o.BillEmail NOT LIKE ''
GROUP BY YEAR(OrderDate)
So you see I have a different query for each year comparison. They're all unrelated. So in the end I'm just finding people who bought in 2008 and 2009, and then a potentially different group that bought in 2008 and 2010, and so on. For this to be accurate, do I have to use the same grouping of 2008 buyers each time? So they bought in 2009 and 2010 and 2011, and 2012?
This is where I'm worried and not sure how to proceed or even find such data.
Any advice would be appreciated! Thanks!
How about a cross-tab on a per customer basis to help you out...
From this, you can start to analyze a bit more in bulk by comparing a customer's
current year to the previous and have a total customers count for each respective year.
From that you can run whatever percentages you want in your final output
This should get you a whole set of all years in question, and you can just keep adding years as need be for comparison. It should be very quick, especially if you have an index on ( BillEMail, OrderDate ).
The premise is that the inner query just blows through all the records, and on a customer basis sets a flag of 1 if there are ANY orders within the given year (via MAX()). It does it via case/when so each year is detected for a customer. Once that has been determined, the outer query then rolls those up comparing each customer with if they had a sale in one year vs the prior, if so, SUM() 1 vs 0 and you have your counts of retention.
SELECT
SUM( case when PreQry.C2011 = 1 and PreQry.C2012 = 1 then 1 else 0 end ) as Retain2011_2012,
SUM( case when PreQry.C2010 = 1 and PreQry.C2011 = 1 then 1 else 0 end ) as Retain2010_2011,
SUM( case when PreQry.C2009 = 1 and PreQry.C2010 = 1 then 1 else 0 end ) as Retain2009_2010,
SUM( case when PreQry.C2008 = 1 and PreQry.C2009 = 1 then 1 else 0 end ) as Retain2008_2009,
SUM( PreQry.C2012 ) CustCount2012,
SUM( PreQry.C2011 ) CustCount2011,
SUM( PreQry.C2010 ) CustCount2010,
SUM( PreQry.C2009 ) CustCount2009,
SUM( PreQry.C2008 ) CustCount2008
from
( select
O.BillEMail as customer,
MAX( CASE when YEAR( O.OrderDate ) = 2012 then 1 else 0 end ) as C2012,
MAX( CASE when YEAR( O.OrderDate ) = 2011 then 1 else 0 end ) as C2011,
MAX( CASE when YEAR( O.OrderDate ) = 2010 then 1 else 0 end ) as C2010,
MAX( CASE when YEAR( O.OrderDate ) = 2009 then 1 else 0 end ) as C2009,
MAX( CASE when YEAR( O.OrderDate ) = 2008 then 1 else 0 end ) as C2008
from
dbo.tblOrder O
where
O.OrderDate >= '2008-01-01'
AND O.BillEmail NOT LIKE '%#halloweencostumes.com'
AND O.BillEmail NOT LIKE ''
group by
O.BillEMail ) as PreQry
Now, if you wanted to detect how many were "NEW" for a given year, you could just add additional columns such as testing the previous year sale flag = 0 vs current year = 1 such as
SUM( case when PreQry.C2011 = 0 and PreQry.C2012 = 1 then 1 else 0 end ) as NewIn2012,
SUM( case when PreQry.C2010 = 0 and PreQry.C2011 = 1 then 1 else 0 end ) as NewIn2011,
SUM( case when PreQry.C2009 = 0 and PreQry.C2010 = 1 then 1 else 0 end ) as NewIn2010,
SUM( case when PreQry.C2008 = 0 and PreQry.C2009 = 1 then 1 else 0 end ) as NewIn2009
If I understand your problem right, then you sounds like you've gotten mixed up in the details. It depends on what you want your definition of 'retain' to be. How about 'also bought in some previous year'? Then, for year X, a customer is retained if they also bought from you in a previous year.
For 2012, for example:
SELECT YEAR(OrderDate) as 'Year',COUNT(distinct(billemail)) as Customers
FROM dbo.tblOrder o with (nolock)
WHERE o.BillEmail IN (SELECT DISTINCT o1.BillEmail
FROM dbo.tblOrder o1 with (nolock)
WHERE o1.OrderDate BETWEEN '2012-1-1' AND '2013-1-1')
AND o.BillEmail IN (SELECT DISTINCT o2.BillEmail
FROM dbo.tblOrder o2 with (nolock)
WHERE o2.OrderDate < '2012-1-1')
AND o.BillEmail NOT LIKE '%#halloweencostumes.com'
AND o.BillEmail NOT LIKE ''
GROUP BY YEAR(OrderDate)
Does this work?
Edit
You can take this a step farther and abstract out the year so 1 query will suffice:
SELECT YEAR(O.OrderDate) as 'Year',COUNT(distinct(billemail)) as Customers
FROM dbo.tblOrder o with (nolock)
WHERE o.BillEmail IN (SELECT DISTINCT o1.BillEmail
FROM dbo.tblOrder o1 with (nolock)
WHERE year(o1.OrderDate)=YEAR(O.OrderDate)
AND o.BillEmail IN (SELECT DISTINCT o2.BillEmail
FROM dbo.tblOrder o2 with (nolock)
WHERE year(o2.OrderDate) < year(o.orderdate)
AND o.BillEmail NOT LIKE '%#halloweencostumes.com'
AND o.BillEmail NOT LIKE ''
GROUP BY YEAR(OrderDate)
This should give you, for each year in which you had orders, the count of distinct customers, and the count of customers who also purchased in a previous year. However, it's not in the same format as the table you want to populate.

Sqlite query comparison multiple times

I have the following schemas (sqlite):
JournalArticle(articleID, title, journal, volume, year, month)
ConferenceArticle(articleID, title, conference, year, location)
Person(name, affiliation)
Author(name, articleID)
I'm trying to get the names of all authors who have number of conferences articles >= journal articles in every year from 2000-2018 inclusive. If an author has 0 articles in each category in a year then the condition still holds. The only years that matter are 2000-2018
The query would be much easier if it was over all years since I could count the journal articles and conferences articles and make a comparison then get the names. However, I'm stuck when trying to check over every year 2000-2018.
I of course don't want to do repetitive queries over all the years. I feel like I may need to group by year but I'm not sure. So far I've been able to get all articles of both types from 2000-2018 as one large table but I'm not sure what to do next.:
select articleID, year
from JournalArticle
where year >= 2000 and year <= 2018
union
select articleID, year
from ConferenceArticle
where year >= 2000 and year <= 2018
Hmmm. Let's start by getting a count for each author and year:
select a.name, year, sum(is_journal), sum(is_conference)
from ((select ja.article_id, ja.year, 1 as is_journal, 0 as is_conference
from journalarticle ja
) union all
(select ca.article_id, ca.year, 0 as is_journal, 1 as is_conference
from conferencearticle ca
)
) jc join
authors a
on a.article_id = jc.article_id
group by a.name, jc.year
Now, you can aggregate again to match the years that match the conditions:
select ay.name
from (select a.name, year, sum(is_journal) as num_journal, sum(is_conference) as num_conference
from ((select ja.article_id, ja.year, 1 as is_journal, 0 as is_conference
from journalarticle ja
) union all
(select ca.article_id, ca.year, 0 as is_journal, 1 as is_conference
from conferencearticle ca
)
) jc join
authors a
on a.article_id = jc.article_id
group by a.name, jc.year
) ay
where (jc.year >= 2000 and jc.year <= 2018) and
num_journal >= num_conference
group by ay.name;
Sounds like you could use a COALESCE in the GROUP BY
SELECT a.name,
COALESCE(j.year, c.year) as "year",
COUNT(j.articleID) AS JournalArticles,
COUNT(c.articleID) AS ConferenceArticles
FROM Author a
LEFT JOIN JournalArticle j ON (j.articleID = a.articleID AND j.year BETWEEN 2000 AND 2018)
LEFT JOIN ConferenceArticle c ON (c.articleID = a.articleID AND c.year BETWEEN 2000 AND 2018)
WHERE (j.year IS NOT NULL OR c.year IS NOT NULL)
GROUP BY a.name, COALESCE(j.year, c.year)
HAVING COUNT(c.articleID) >= COUNT(j.articleID)