How do I sum over multiple criteria in T-SQL? - sql

I'm trying to improve on my very basic SQL querying skills and am using the AdventureWorks2012 sample database in SQL Server 2012. I have used SUM() OVER(PARTITION BY) like this:
SELECT DISTINCT
SUM(SubTotal) OVER (PARTITION BY CustomerID), CustomerID
FROM
Sales.SalesOrderHeader
To get the total sales value for each customer, however I'd like to sum the SubTotal by customer & year using YEAR(OrderDate) to extract just the year portion of the order date.
Firstly it appears that I can't use the year portion of the order date to sum by year independently of customer so this approach isn't going to work anyhow.
Secondly I can't see any way to use multiple partition criteria.
I suspect that my inexperience is leading me to think about this in the wrong way so a theoretical approach would be as useful as a specific solution.
I guess I'm looking for something that is functionally similar to Excel's SUMIFS() function

First, the correct way to write your query is:
SELECT CustomerID, SUM(SubTotal)
FROM Sales.SalesOrderHeader
GROUP BY CustomerID;
Using SELECT DISTINCT with window functions is clever. But, it overcomplicates the query, can have poorer performance, and is confusing to anyone reading it.
To get the information by year (for each customer), just add that to the SELECT and GROUP BY:
SELECT CustomerID, YEAR(OrderDate) as yyyy, SUM(SubTotal)
FROM Sales.SalesOrderHeader
GROUP BY CustomerID, YEAR(OrderDate)
ORDER BY CustomerId, yyyy;
If you actually want to get separate rows with subtotals, then study up on GROUPING SETS and ROLLUP. These are options to the GROUP BY.

You should use group by instead of PARTITION BY whenever you need an aggregate (sum/count/max) against a specific column like (customerid) as following
select customerId, sum(subTotal)
FROM sales.salesOrderHeader
group by customerId
Edit : including missing requirement of date (response to comment)
If you want calculation against more than one column, you still can do it same way. Just add the date in group by clause as group by customerId, saleDate
select customerId, sum(subTotal)
,saleDate //=> you can miss it (date) from selection if you want to
FROM sales.salesOrderHeader
group by customerId, saleDate

Related

Find the maximum average over a specific period, two tables

My task sounds like this: "Select sales territory (name) with sales in May 2013 higher than the average monthly sales per sales territory (Use SalesTerritory, SalesHeader tables)." As I understand it, logically, I need to find what territory was the maximum average for May 2013, while I need to link two tables (the "name" field in the "salesterritory" table, the rest of the data in the second, but the "name" must be present).
I tried to divide the task into parts, and find at least a territory by id without a name, here is my code:
SELECT TerritoryID, MAX(avga.sal)
from (select YEAR(OrderDate) AS 'Year', MONTH(OrderDate) AS 'Month', TerritoryID, AVG(TotalDue) AS 'sal'
FROM Sales.SalesOrderHeader
GROUP BY YEAR(OrderDate), MONTH(OrderDate), TerritoryID
having YEAR(OrderDate)=2013) as avga
group by TerritoryID
This result does not appear to be correct even at this stage. Please help how to do it right? At least without the second table.
Can you try this steps:
Separate this query into small queries that collect part of the data you want and make sense to you, for example: A query to select the sales territory (name) with sales in May 2013; another query that brings the average monthly sales by sales territory etc. This will help you understand parts of the main query that you will create.
You can now try this in one query. Perhaps common table expressions is an easier approach. Here are some examples: CTE
I believe you need both the average per territory in May 2013, but also the average across all territories for the same month. Note the use of OVER() in the query below. This clause enables to calculation of an average across multiple rows, which is ideal in this situation because we need to only return those territories that have their figures higher than the overall average.
select
yyyy
, mm
, TerritoryID
, territory_av
, month_av
from (
SELECT
yyyy
, mm
, TerritoryID
, territory_av
, AVG(av_value) OVER() month_av
FROM (
SELECT
YEAR(OrderDate) AS yyyy
, MONTH(OrderDate) AS mm
, TerritoryID
, AVG(TotalDue) AS territory_av
FROM Sales.SalesOrderHeader
WHERE YEAR(OrderDate) = 2013
AND MONTH(OrderDate) = 5
GROUP BY
YEAR(OrderDate)
, MONTH(OrderDate)
, TerritoryID
) AS derive1
) AS derive2
) AS derive3
WHERE territory_av > month_av
;
Don't use having as an alternative for where. Use where to filter table data which reduces the data processed by group by. Use having to filter aggregated values which happens after group by.
Regarding filtering for May 2013, it is more efficient to NOT use functions on data to assist filtering in a where clause. A more generic way to select a date range (that does not require changing data via functions) is like this:
WHERE OrderDate >= '2013-05-01'
AND OrderDate < '2013-06-01'
Syntax for dates differes amongst databases, you might need to convert the date literals into a date (or timestamp)
WHERE OrderDate >= to_date('2013-05-01','yyyy-mm-dd')
AND OrderDate < to_date('2013-06-01','yyyy-mm-dd')
or, in SQL Server you could use this:
WHERE OrderDate >= '20130501'
AND OrderDate < '20130601'

How do you use having for multiple conditions?

In the following codes, how do you exclude members's spending that's larger than $500 for each year (instead of total spending for all years)?
select
Year
,month
,memberkey
,sum(spending) as spending
from table1
group by
1,2,3
A HAVING clause won't work here since you really want to aggregate at the YEAR level to determine which records should be included. Traditionally you would do this with a correlated subquery, but in Teradata you can make use of the QUALIFY clause:
SELECT "Year"
,"Month"
,MemberKey
,spending
from table1
QUALIFY sum(spending) OVER (PARTITION BY "Year", MemberID) < 500

How to group by year with the year showing only once

I have tried using the following query
select distinct Year (SaleDate) AS SaleYear,Max(SalePrice)
from Sale
group by SaleDate
The years 2010 and 2014 are showing twice,even though i used distinct and group by. the amounts in Maxprice are different as well. am i doing something wrong here?
You need to repeat year() in the group by:
select Year(SaleDate) AS SaleYear, Max(SalePrice)
from Sale
group by year(SaleDate);
SELECT DISTINCT with GROUP BY is almost never correct. All that your query does is aggregate by SaleDate and in the result set extract the year. That is why you see duplicates.

SQL QUERY - Filter date from the beginning

How to write SQL query that will sum the amount from the previous days/years. Like from the start.
Scenario I want to compute accumulated sales of the store from the day it was opened.
Example
SELECT SUM(AMOUNT)
FROM TransactionTable
WHERE TransactionDate = ???
The plan that I have is to query on this table and get the oldest transaction date record, then I will use that in the WHERE condition. Do you think that it is the best solution?
You can try below using having min(transaction) which will give you the date when transaction first started
select sum (amt) from
(
SELECT SUM(AMOUNT) as amt from TransactionTable
group by TransactionDate
having TransactionDate between min(TransactionDate) and getdate()
)A
To compute accumulated sales of the store from the day it started you can use SUM with OVER clause
SELECT TransactionDate, SUM(AMOUNT) OVER (ORDER BY TransactionDate) AS AccumulatedSales
FROM TransactionTable
use group by TransactionDate
SELECT convert(date,TransactionDate), SUM(AMOUNT) from TransactionTable
group by convert(date,TransactionDate)

Query on MAX on date column, and COUNT of another column

I performed the following query with cte's, but I was wondering if there was a simpler way of writing the code, maybe with subqueries? I'm retrieving everything from one table SALES, but I'm using 3 columns: AgentID, SaleDate, and OrderID.
WITH RECENT_SALE AS(
SELECT AGENTID,(
SALEDATE,
ROW_NUMBER() OVER (PARTITION BY AGENTID ORDER BY SALEDATE DESC) AS RN
FROM SALES
)
,
COUNT_SALE AS (
SELECT AGENTID,
COUNT(ORDERID) AS COUNTORDERS
FROM SALES
)
SELECT RECENT_SALE.MRN,
SALEDATE,
COUNTORDERS
FROM RECENT_SALE
INNER JOIN COUNT_SALE ON RECENT_SALE.AGENTID = COUNT_SALE.AGENTID;
It looks to me like you're just trying to get the total number of sales per agent as well as the date of his or her most recent sale? If I understand your structure correctly (and I may not), then it seems pretty straightforward. I'm guessing orderid is the primary key of SALES?
SELECT agentid, MAX(saledate) AS saledate -- Most recent sale date
, COUNT(orderid) AS countsales -- total sales
FROM sales
GROUP BY agentid;
There does not seem to be any need for CTEs or subqueries here.
Try this:
SELECT
saledate,
AGENTID,
count(orderid) over(partition by AGENTID order by saledate)
FROM SALES
group by
saledate,
AGENTID