Splitting out yearly sums across columns - sql

I have 2 tables where I'm trying to grab counts of student interactions by year, then sum them in their respective year. The code I've attached works, but I'm wondering if I've neglected a much easier way of doing this calculation. For example, if I wanted to do this for more than 2 years I'd be cursing myself doing it this way.
select s.id
, coalesce(cl2016.cl2016, 0) + coalesce(wf2016.wf2016, 0) as s2016
, coalesce(cl2017.cl2017, 0) + coalesce(wf2017.wf2017, 0) as s2017
from students s
left join (
select dm.student_id
, count(dm.meeting_id) as cl2016
from dim_meetings dm
where dm.start_time between '2016-01-01' and '2016-12-31'
group by dm.student_id
) cl2016 on cl2016.student_id = s.id
left join (
select dm.student_id
, count(dm.meeting_id) as cl2017
from dim_meetings dm
where dm.start_time between '2017-01-01' and '2017-12-31'
group by dm.student_id
) cl2017 on cl2017.student_id = s.id
left join (
select sub.student_id
, count(sub.id) as wf2016
from submissions sub
where sub.submitted_at between '2016-01-01' and '2016-12-31'
group by sub.student_id
) wf2016 on wf2016.student_id = s.id
left join (
select sub.student_id
, count(sub.id) as wf2017
from submissions sub
where sub.submitted_at between '2017-01-01' and '2017-12-31'
group by sub.student_id
) wf2017 on wf2017.student_id = s.id

Use conditional aggregation:
select s.id, dm.cl2016, cl2017, su.wf2016, su.wf2017,
coalesce(dm.cl2016, 0) + coalesce(su.wf2016, 0) as s2016,
coalesce(dm.cl2017, 0) + coalesce(su.wf2017, 0) as s2017
from students s left join
(select dm.student_id,
sum( (dm.start_time between '2016-01-01' and '2016-12-31')::int) as cl2016,
sum( (dm.start_time between '2017-01-01' and '2017-12-31')::int) as cl2017
from dim_meetings dm
where dm.start_time between '2016-01-01' and '2017-12-31'
group by dm.student_id
) dm
on dm.student_id = s.id left join
(select su.student_id,
sum( (su.start_time between '2016-01-01' and '2016-12-31')::int) as wf2016,
sum( (su.start_time between '2017-01-01' and '2017-12-31')::int) as wf2017
from submissions su
where su.start_time between '2016-01-01' and '2017-12-31'
group by su.student_id
) su
on su.student_id = s.id ;

It is useful to place a case expression inside an aggregate function as shown in the example below:
select
dm.student_id
, count(dm.start_time '2016-01-01'
and < dm.start_time '2017-01-01' then dm.meeting_id end) as cl2016
, count(dm.start_time '2017-01-01'
and < dm.start_time '2018-01-01' then dm.meeting_id end) as cl2017
from dim_meetings dm
where dm.start_time between '2016-01-01' and dm.start_time < '2018-01-01'
group by dm.student_id
;
A case expression returns a value, then that value is used within the aggregate function just as it would any other value.
Please also note that "between" is evil for date ranges, instead use explicit ranges via >= and < as shown above, which involves moving the upper boundary date up to the the next day. See:
Bad habits to kick : mis-handling date / range queries
What do BETWEEN and the devil have in common?

Related

Summing Two Columns from Two Tables with Two Dates

I work for a CPG company and need to create a report that compares the previous month's delivered units to the next month's forecast. (Simply, our forecasting tool screws up occasionally and this will help identify when the forecast is off.)
My issue is my SQL query is summing forecast sales correctly, but the sum of total delivered is not respecting the dates I have in my WHERE clause -- it's summing total delivered for as far back as the query can reach.
Here is my query:
SELECT
DelUnits.Customer, DelUnits.ObsText01,
FinalFcst.SKU, FinalFcst.Customer,
SUM(DelUnits.Value) AS TotalDelivered,
SUM(FinalFcst.FinalFcst) AS ForecastSales
FROM
DelUnits
LEFT JOIN
FinalFcst ON DelUnits.Customer = FinalFcst.Customer
WHERE
(FinalFcst.DT >= '2018-01-01' and FinalFcst.DT <= '2018-01-31')
AND (DelUnits.Date >= '2017-12-01' and DelUnits.Date <= '2017-12-31')
AND DelUnits.ObsText01 = '10_LB'
AND FinalFcst.SKU = '10_LB'
GROUP BY
DelUnits.Customer, DelUnits.ObsText01, FinalFcst.SKU, FinalFcst.Customer
Again, the query seems to work correctly for the final forecast (summing the forecast between 1/1/18 - 1/31/18) but sums the entire delivery history for a customer. I don't understand why it won't sum the delivery history for just 12/1/17 - 12/31/17.
Thank you for your help!
Presumably, there is only one row for FinalFcst. So, either include it in the GROUP BY clause or use MAX() instead of SUM():
max(FinalFcst.FinalFcst) as ForecastSales
One way to achieve this is to calculate TotalDelivered and ForecastSales in 2 different queries and then join them together.
Try this:
SELECT DelUnits.customer,
DelUnits.obstext01,
FinalFcst.sku,
FinalFcst.customer,
totaldelivered,
forecastsales
FROM (SELECT customer,
obstext01,
Sum(value) AS TotalDelivered
FROM delunits
WHERE date >= '2017-12-01'
AND date <= '2017-12-31'
AND obstext01 = '10_LB'
GROUP BY customer,
obstext01) DelUnits
LEFT JOIN (SELECT customer,
sku,
Sum(finalfcst) AS ForecastSales
FROM finalfcst
WHERE dt >= '2018-01-01'
AND dt <= '2018-01-31'
AND sku = '10_LB'
GROUP BY customer,
sku) FinalFcst ON DelUnits.customer = FinalFcst.customer
You have a many to many relationship between the tables. Ultimately you need to SUM() one table before joining to the other to create a one to many relationship, or you end up duplicating records.
My favorite approach is a derived table:
SELECT C.Customer,
C.ObsText01,
FC.SKU,
C.TotalDelivered,
SUM(FC.FinalFcst) ForecastSales
FROM (SELECT SUM(Value) TotalDelivered, Customer, ObsText01
FROM DelUnits
WHERE Date >= '2017-12-01' AND Date <= '2017-12-31'
AND ObsText01 = '10_LB'
GROUP BY Customer) C
LEFT JOIN FinalFcst FC ON C.Customer = FC.Customer
AND FC.DT >= '2018-01-01'
AND FC.DT <= '2018-01-31'
AND FC.SKU = '10_LB'
GROUP BY C.Customer, C.ObsText01, FC.SKU, C.TotalDelivered
A couple things: Added your forecast table filters to the join predicate, since having those in the WHERE will create an INNER JOIN out of your LEFT JOIN. Also removed FC.Customer from the select and the group since it is redundant with C.Customer.
Maybe you could try to create a temp table to calculate the delivery history. I am not sure of the SQL Server verbiage, but something like this:
WITH DEL_HIST AS
(SELECT DelUnits.Customer,
DelUnits.ObsText01,
sum(DelUnits.Value) as TotalDelivered,
FROM DelUnits
Where(DelUnits.Date >= '2017-12-01' and DelUnits.Date <= '2017-12-31')
and DelUnits.ObsText01 = '10_LB'
Group By DelUnits.Customer, DelUnits.ObsText01)
SELECT
DEL_HIST.Customer,
DEL_HIST.ObsText01,
FinalFcst.SKU,
FinalFcst.Customer,
DEL_HIST.TotalDelivered,
sum(FinalFcst.FinalFcst) as ForecastSales
FROM DEL_HIST
left join FinalFcst ON DelUnits.Customer = FinalFcst.Customer
Where (FinalFcst.DT >= '2018-01-01' and FinalFcst.DT <= '2018-01-31')
and FinalFcst.SKU = '10_LB'
Group By DelUnits.Customer, DelUnits.ObsText01, FinalFcst.SKU, FinalFcst.Customer

Is it possible to add second where condition to select this same data but from other date range?

I have two tables in SQL Server.
I want to select DeptCode, DeptName, YearToDate, PeriodToDate (2 months for example) and group it by DeptCode.
There is a result which I want to get:
In YTD column I want to get sum of totalCost since 01/01/actualYear.
In PTD column I want to get the sum from last two months.
I created a piece of code which shows me correct YTD cost but I don't know how I can add next one for getting total cost for other date range. Is it possible to do this?
SELECT
d.DeptCode,
d.DeptName,
SUM(s.TotalCost) as YTD
FROM [Departments] AS d
INNER JOIN Shipments AS s
ON d.DeptCode= s.DeptCode
WHERE s.ShipmentDate BETWEEN DateAdd(yyyy, DateDiff(yyyy, 0, GetDate()), 0)
AND GETDATE()
GROUP BY d.DeptCode, d.DeptName
Your expected output doesn't match 2 months, but here's the code to accomplish what you want. You just have to add a SUM(CASE...) on the 2nd condition.
SELECT
d.DeptCode,
d.DeptName,
SUM(s.TotalCost) as YTD,
SUM(CASE WHEN s.ShipmentDate >= DATEADD(month, -2, GETDATE()) then s.TotalCost else 0 END) as PTD
FROM [Departments] AS d
INNER JOIN Shipments AS s
ON d.DeptCode= s.DeptCode
WHERE Year(s.ShipmentDate) = Year(GETDATE())
GROUP BY d.DeptCode, d.DeptName
Just add one more column that returns 0 when not in the two-month range, e.g. SUM(CASE WHEN (date check) THEN (amount) ELSE 0 END). Check out the fifth line:
SELECT
d.DeptCode,
d.DeptName,
SUM(s.TotalCost) as YTD,
SUM(CASE WHEN DateDiff(MONTH, s.ShipmentDate, GetDate()) < 2 THEN s.TotalCost ELSE 0 END) PTD,
FROM [Departments] AS d
INNER JOIN Shipments AS s
ON d.DeptCode= s.DeptCode
WHERE s.ShipmentDate BETWEEN DateAdd(yyyy, DateDiff(yyyy, 0, GetDate()), 0)
AND GETDATE()
GROUP BY d.DeptCode, d.DeptName
Try this one :
nbr_last2month_ AS
(
SELECT DISTINCT
Sum(s.[TotalCost]) AS 'PTD',
s.DeptCode,
s.DeptName
FROM [Shipements] s
LEFT JOIN [Departements] d ON d.[DeptCode] = s.[DeptCode]
WHERE Year(date_) LIKE Year(GETDATE())
AND MONTH(ShipementDate) LIKE Month(Getdate()) - 2
Group by DeptCode
),
nbr_YTD_ AS
(
SELECT DISTINCT
Sum(s.[TotalCost]) AS 'YTD',
s.DeptCode,
s.DeptName
FROM [Shipements] s
LEFT JOIN [Departements] d ON d.[DeptCode] = s.[DeptCode]
WHERE Year(ShipementDate) LIKE Year(GETDATE())
Group by DeptCode
),
SELECT
A.DeptCode,
A.DeptName,
YTD,
PTD
FROM nbr_YTD_ A
LEFT JOIN nbr_last2month_ B on B.DeptCode = A.DeptCode
ORDER BY DeptCode

Display Month Gaps for Each location

I have the following query which takes in the opps and calculates the duration, and revenue for each month. However, for some locations, where there is no data, it is missing some months. Essentially, I would like all months to appear for each of the location and record type. I tried a left outer join on the calendar but that didn't seem to work either.
Here is the query:
;With DateSequence( [Date] ) as
(
Select CAST(#fromdate as DATE) as [Date]
union all
Select CAST(dateadd(day, 1, [Date]) as Date)
from DateSequence
where Date < #todate
)
INSERT INTO CalendarTemp (Date, Day, DayOfWeek, DayOfYear, WeekOfYear, Month, MonthName, Year)
Select
[Date] as [Date],
DATEPART(DAY,[Date]) as [Day],
DATENAME(dw, [Date]) as [DayOfWeek],
DATEPART(DAYOFYEAR,[Date]) as [DayOfYear],
DATEPART(WEEK,[Date]) as [WeekOfYear],
DATEPART(MONTH,[Date]) as [Month],
DATENAME(MONTH,[Date]) as [MonthName],
DATEPART(YEAR,[Date]) as [Year]
from DateSequence option (MaxRecursion 10000)
;
DELETE FROM CalendarTemp WHERE DayOfWeek IN ('Saturday', 'Sunday');
SELECT
AccountId
,AccountName
,Office
,Stage = (CASE WHEN StageName = 'Closed Won' THEN 'Closed Won'
ELSE 'Open'
END)
,Id
,Name
,RecordType= (CASE
WHEN recordtypeid = 'LAS1' THEN 'S'
END)
,Start_Date
,End_Date
,Probability
,Estimated_Revenue_Won = ISNULL(Amount, 0)
,ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Name) AS Row
--,Revenue_Per_Day = CAST(ISNULL(Amount/NULLIF(dbo.CalculateNumberOFWorkDays(Start_Date, End_Date),0),0) as money)
,YEAR(c.Date) as year
,MONTH(c.Date) as Month
,c.MonthName
--, ISNULL(CAST(Sum((Amount)/NULLIF(dbo.CalculateNumberOFWorkDays(Start_Date, End_Date),0)) as money),0) As RevenuePerMonth
FROM SF_Extracted_Opps o
LEFT OUTER JOIN CalendarTemp c on o.Start_Date <= c.Date AND o.End_Date >= c.Date
WHERE
Start_Date <= #todate AND End_Date >= #fromdate
AND Office IN (#Location)
AND recordtypeid IN ('LAS1')
GROUP BY
AccountId
,AccountName
,Office
,(CASE WHEN StageName = 'Closed Won' THEN 'Closed Won'
ELSE 'Open'
END)
,Id
,Name
,(CASE
WHEN recordtypeid = 'LAS1' THEN 'S'
END)
,Amount
--, CAST(ISNULL(Amount/NULLIF(dbo.CalculateNumberOFWorkDays(Start_Date, End_Date),0),0) as money)
,Start_Date
,End_Date
,Probability
,YEAR(c.Date)
,Month(c.Date)
,c.MonthName
,dbo.CalculateNumberOFWorkDays(Start_Date, End_Date)
ORDER BY Office
, (CASE
WHEN recordtypeid = 'LAS1' THEN 'S'
END)
,(CASE WHEN StageName = 'Closed Won' THEN 'Closed Won'
ELSE 'Open'
END)
, [Start_Date], Month(c.Date), AccountName, Row;
I tried adding another left outer join to this and using this a sub query and the join essentially on the calendar based on the year and month, but that did not seem to work either. Suggestions would be extremely appreciated.
--Date Calendar for each location:
;With DateSequence( [Date], Locatio) as
(
Select CAST(#fromdate as DATE) as [Date], oo.Office as location
union all
Select CAST(dateadd(day, 1, [Date]) as Date), oo.Office as location
from DateSequence dts
join Opportunity_offices oo on 1 = 1
where Date < #todate
)
--select result
INSERT INTO CalendarTemp (Location,Date, Day, DayOfWeek, DayOfYear, WeekOfYear, Month, MonthName, Year)
Select
location,
[Date] as [Date],
DATEPART(DAY,[Date]) as [Day],
DATENAME(dw, [Date]) as [DayOfWeek],
DATEPART(DAYOFYEAR,[Date]) as [DayOfYear],
DATEPART(WEEK,[Date]) as [WeekOfYear],
DATEPART(MONTH,[Date]) as [Month],
DATENAME(MONTH,[Date]) as [MonthName],
DATEPART(YEAR,[Date]) as [Year]
from DateSequence option (MaxRecursion 10000)
;
you have your LEFT JOIN backwards if you want all records from CalendarTemp and only those that match from SF_Extracted_Opps then you the CalendarTemp should be the table on the LEFT. You can however switch LEFT JOIN to RIGHT JOIN and it should be fixed. The other issue will be your WHERE statement is using columns from your SF_Extracted_Opps table which will just make that an INNER JOIN again.
here is one way to fix.
SELECT
.....
FROM
CalendarTemp c
LEFT JOIN SF_Extracted_Opps o
ON o.Start_Date <= c.Date AND o.End_Date >= c.Date
AND o.Start_Date <= #todate AND End_Date >= #fromdate
AND o.Office IN (#Location)
AND o.recordtypeid IN ('LAS1')
The other issue you might run into is because you remove weekends from your CalendarTemp Table not all dates are represented I would test with the weekends still in and out and see if you get different results.
this line:
AND o.Start_Date <= #todate AND End_Date >= #fromdate
should not be needed either because you are already limiting the dates from the line before and values in your CalendarTempTable
A note about your CalendarDate table you don't have to go back and delete those records simply add the day of week as a WHERE statement on the select that populates that table.
Edit for All Offices you can use a cross join of your offices table with your CalendarTemp table to do this do it in your final query not the cte that builds the calendar. The problem with doing it in the CTE calendar definition is that it is recursive so you would have to do it in both the anchor and the recursive member definition.
SELECT
.....
FROM
CalendarTemp c
CROSS JOIN Opportunity_offices oo
LEFT JOIN SF_Extracted_Opps o
ON o.Start_Date <= c.Date AND o.End_Date >= c.Date
AND o.Start_Date <= #todate AND End_Date >= #fromdate
AND oo.office = o.Office
AND o.recordtypeid IN ('LAS1')

Why I obtain this "Incorrect syntax near the keyword 'between'" into a SQL Server query that use between?

I am not familiar with databases.
I am working on Microsoft SQL Server and I have some problem trying to perform this query that uses between keyboard to select a Date range.
My query is:
select NumeroPolizza ,sum(v.Ctv) as Ctv_RI
from (
select r.NumeroPolizza,SUM(r.ImportoPrestazioneIniziale) as Ctv
from Prestazione r with(nolock)
where r.NumeroPolizza in (select ID from Polizza p with(nolock) where TipoSistemaProvenienzaID=8)
--and r.DataInizio <= '2015-12-31'
and between '2016-01-01' and '2016-04-01'
group by r.NumeroPolizza
UNION
select NumeroPolizza,SUM(ImportoRivalutazioneDaPiano+ImportoRivalutazioneEstemporaneo)as Ctv
from Rivalutazione with(nolock)
where NumeroPolizza in (select ID from Polizza p with(nolock) where TipoSistemaProvenienzaID=8)
--and DAtaDecorrenza <= '2015-12-31'
and between '2016-01-01' and '2016-04-01'
group by NumeroPolizza
) v
group by NumeroPolizza
order by NumeroPolizza
As you can see I am using 2 between as filter of 2 where conditions, something like this:
and between '2016-01-01' and '2016-04-01'
the problem is that SQL Server give me the following error message:
11:30:36 [SELECT - 0 row(s), 0.000 secs] [Error Code: 156, SQL State: S0001] Incorrect syntax near the keyword 'between'.
... 1 statement(s) executed, 0 row(s) affected, exec/fetch time: 0.000/0.000 sec [0 successful, 0 warnings, 1 errors]
What am I missing? How do I fix this issue?
You have to state what is BETWEEN, so I guess:
and DAtaDecorrenza between '2016-01-01' and '2016-04-01'
You miss the column Name before between.
Syntax is <column_name> between <value> and <other_value>
You forgot to write column name before between. Please check updated query
select NumeroPolizza ,sum(v.Ctv) as Ctv_RI
from (
select r.NumeroPolizza,SUM(r.ImportoPrestazioneIniziale) as Ctv
from Prestazione r with(nolock)
where r.NumeroPolizza in (select ID from Polizza p with(nolock) where TipoSistemaProvenienzaID=8)
--and r.DataInizio <= '2015-12-31'
and r.DataInizio between '2016-01-01' and '2016-04-01'
group by r.NumeroPolizza
UNION
select NumeroPolizza,SUM(ImportoRivalutazioneDaPiano+ImportoRivalutazioneEstemporaneo)as Ctv
from Rivalutazione with(nolock)
where NumeroPolizza in (select ID from Polizza p with(nolock) where TipoSistemaProvenienzaID=8)
--and DAtaDecorrenza <= '2015-12-31'
and r.DataInizio between '2016-01-01' and '2016-04-01'
group by NumeroPolizza
) v
group by NumeroPolizza
order by NumeroPolizza
I have specified the datecolumn before the BETWEEN. Please refer this,
SELECT NumeroPolizza
,sum(v.Ctv) AS Ctv_RI
FROM (
SELECT r.NumeroPolizza
,SUM(r.ImportoPrestazioneIniziale) AS Ctv
FROM Prestazione r WITH (NOLOCK)
WHERE r.NumeroPolizza IN (
SELECT ID
FROM Polizza p WITH (NOLOCK)
WHERE TipoSistemaProvenienzaID = 8
)
--and r.DataInizio <= '2015-12-31'
AND r.DataInizio BETWEEN '2016-01-01'
AND '2016-04-01'
GROUP BY r.NumeroPolizza
UNION
SELECT NumeroPolizza
,SUM(ImportoRivalutazioneDaPiano + ImportoRivalutazioneEstemporaneo) AS Ctv
FROM Rivalutazione WITH (NOLOCK)
WHERE NumeroPolizza IN (
SELECT ID
FROM Polizza p WITH (NOLOCK)
WHERE TipoSistemaProvenienzaID = 8
)
--and DAtaDecorrenza <= '2015-12-31'
AND DAtaDecorrenza BETWEEN '2016-01-01'
AND '2016-04-01'
GROUP BY NumeroPolizza
) v
GROUP BY NumeroPolizza
ORDER BY NumeroPolizza

How to output only one max value from this query in SQL?

Yesterday Thomas helped me a lot by providing exactly the query I wanted. And now I need a variant of it, and hopes someone can help me out.
I want it to output only one row, namely a max value - but it has to build on the algorithm in the following query:
WITH Calendar AS (SELECT CAST(#StartDate AS datetime) AS Date
UNION ALL
SELECT DATEADD(d, 1, Date) AS Expr1
FROM Calendar AS Calendar_1
WHERE (DATEADD(d, 1, Date) < #EndDate))
SELECT C.Date, C2.Country, COALESCE (SUM(R.[Amount of people per day needed]), 0) AS [Allocated testers]
FROM Calendar AS C CROSS JOIN
Country AS C2 LEFT OUTER JOIN
Requests AS R ON C.Date BETWEEN R.[Start date] AND R.[End date] AND R.CountryID = C2.CountryID
WHERE (C2.Country = #Country)
GROUP BY C.Date, C2.Country OPTION (MAXRECURSION 0)
The output from above will be like:
Date Country Allocated testers
06/01/2010 Chile 3
06/02/2010 Chile 4
06/03/2010 Chile 0
06/04/2010 Chile 0
06/05/2010 Chile 19
but what I need right now is
Allocated testers
19
that is - only one column - one row - the max value itself... (for the (via parameters (that already exists)) selected period of dates and country)
use order and limit
ORDER BY 'people needed DESC' LIMIT 1
EDITED
as LIMIT is not exist in sql
use ORDER BY and TOP
select TOP 1 .... ORDER BY 'people needed' DESC
WITH Calendar
AS (
SELECT
CAST(#StartDate AS datetime) AS Date
UNION ALL
SELECT
DATEADD(d, 1, Date) AS Expr1
FROM
Calendar AS Calendar_1
WHERE
( DATEADD(d, 1, Date) < #EndDate )
)
SELECT TOP 1 *
FROM
(
SELECT
C.Date
,C2.Country
,COALESCE(SUM(R.[Amount of people per day needed]), 0) AS [Allocated testers]
FROM
Calendar AS C
CROSS JOIN Country AS C2
LEFT OUTER JOIN Requests AS R
ON C.Date BETWEEN R.[Start date] AND R.[End date]
AND R.CountryID = C2.CountryID
WHERE
( C2.Country = #Country )
GROUP BY
C.Date
,C2.Country
OPTION
( MAXRECURSION 0 )
) lst
ORDER BY lst.[Allocated testers] DESC
Full example following the discussion in #Salil answer..
WITH Calendar AS (SELECT CAST(#StartDate AS datetime) AS Date
UNION ALL
SELECT DATEADD(d, 1, Date) AS Expr1
FROM Calendar AS Calendar_1
WHERE (DATEADD(d, 1, Date) < #EndDate))
SELECT TOP 1 C.Date, C2.Country, COALESCE (SUM(R.[Amount of people per day needed]), 0) AS [Allocated testers]
FROM Calendar AS C CROSS JOIN
Country AS C2 LEFT OUTER JOIN
Requests AS R ON C.Date BETWEEN R.[Start date] AND R.[End date] AND R.CountryID = C2.CountryID
WHERE (C2.Country = #Country)
GROUP BY C.Date, C2.Country
ORDER BY 3 DESC
OPTION (MAXRECURSION 0)
the ORDER BY 3 means order by the 3rd field in the SELECT statement.. so if you remove the first two fields, change this accordingly..