SQL Server : remove duplicates from count() - sql

I'm creating a report in a SQL Server database. I will show it's code first and then describe what it does and where is problem.
SELECT
COUNT(e.flowid) AS [count],
t.name AS [process],
CAST(DATEPART(YEAR, e.dtcr) AS VARCHAR) + '-' + CAST(RIGHT('0' + RTRIM(DATEPART(MONTH, e.dtcr)), 2) AS VARCHAR) + '-' + CAST(RIGHT('0' + RTRIM(DATEPART(DAY, e.dtcr)), 2) AS VARCHAR) AS [day]
FROM
dbo.[Event] e
JOIN
dbo.Flow f ON e.flowid = f.id
JOIN
dbo.WorkOrder o ON f.workorderno = o.number
AND o.treenodeid IN (26067, 26152, 2469, 1815, 1913) -- only from requested processes
JOIN
dbo.TreeNode t ON o.treenodeid = t.id -- for process name in select statement
JOIN
dbo.Product p ON f.productid = p.id
AND p.materialid NOT IN (26094, 27262, 27515, 27264, 28192, 28195, 26090, 26092, 26093, 27065, 26969, 27471, 28351, 28353, 28356, 28976, 27486, 29345, 29346, 27069, 28653, 28654, 26735, 26745, 28686) -- exclude unwanted family codes
WHERE
e.pass = 1 -- only passed units
AND e.treenodeid IN (9036, 9037, 9038, 9039, 12594, 26330) -- only from requested events
AND e.dtcr BETWEEN '2015-12-01 00:00:00.000' AND '2016-05-31 23:59:59.999' -- only from requested time interval
GROUP BY
DATEPART(YEAR, e.dtcr), DATEPART(MONTH, e.dtcr), DATEPART(DAY, e.dtcr), t.name
ORDER BY
[day]
What query does is count units that passed specific events in a time periods (with some filters).
Important tables are:
Event - basically log for units passing specific events.
Product - list of units.
Output is something like this:
COUNT PROCESS DAY
71 Process-1 2015-12-01
1067 Process-2 2015-12-01
8 Process-3 2015-12-01
3 Process-4 2015-12-01
15 Process-1 2015-12-02
276 Process-2 2015-12-02
47 Process-3 2015-12-02
54 Process-4 2015-12-02
It does well but there is an issue. In some specific cases unit can pass same event several times and this query counts every such passing. I need to count every unit only once.
"Duplicated" records are in Event table. They have different dates and ids. Same for all records I need to count only once is flowid. Is there any simple way to achieve this?
Thank you for your time and answers!

To count each flowid only once, do count(distinct flowid), i.e.
SELECT
COUNT(distinct e.flowid) AS [count],
t.name AS [process],
CAST(DATEPART(YEAR, e.dtcr) AS VARCHAR) + '-' + CAST(RIGHT('0' + RTRIM(DATEPART(MONTH, e.dtcr)), 2) AS VARCHAR) + '-' + CAST(RIGHT('0' + RTRIM(DATEPART(DAY, e.dtcr)), 2) AS VARCHAR) AS [day]
FROM
...

It sounds like you need the first time that something passes the threshold. You can get the first time using row_number(). This can be tricky with the additional conditions on the query. This modification might work for you:
select sum(case when seqnum = 1 then 1 else 0 end) as cnt,
. . .
from (select e.*,
row_number() over (partition by eventid order by e.dtcr) as seqnum
from event e
where e.pass = 1 and -- only passed units
e.treenodeid IN (9036, 9037, 9038, 9039, 12594, 26330) and
e.dtcr >= '2015-12-01' AND e.dtcr < '2016-06-01'
) e join
. . .
You don't specify how the same event is identified for the duplicates. The above uses eventid for this purpose.

Related

How to optimize my query speed (avoid using subselect for every row)?

I have a table called CisLinkLoadedData. Is has Distributor, Network, Product, DocumentDate, Weight, AmountCP and Quantity columns. It used to store some product daily sales. AmountCP / Quantity is the price for the product at certain date. There are promo and regular sales, but no flag for it. We can tell if certain record is regular or promo by comparing it's price with the maximum recorded price within month. I did explained it on this picture.
I need to make a query to display summarized regular and promo sales of certain product per month. Well, I made it, but it very slow (6 minutes to execute at 1.6 millions records).
I suspect this is because I use subquery to determine max price for every record, but I don't know how to make it another way.
This is what I made:
SELECT
Distributor,
Network,
Product,
cast(month(DocumentDate) as VARCHAR) + '.' + cast(year(DocumentDate) as VARCHAR) AS MonthYear,
SUM(Weight) AS MonthlyWeight,
IsPromo
FROM (SELECT
main_clld.Distributor,
main_clld.Network,
main_clld.Product,
main_clld.DocumentDate,
main_clld.Weight,
main_clld.Quantity,
main_clld.AmountCP,
CASE WHEN (main_clld.AmountCP / main_clld.Quantity) < (SELECT MAX(sub_clld.AmountCP / NULLIF(sub_clld.Quantity, 0)) FROM CisLinkLoadedData AS sub_clld WHERE sub_clld.Distributor = main_clld.Distributor AND sub_clld.Network = main_clld.Network AND sub_clld.Product = main_clld.Product AND cast(month(sub_clld.DocumentDate) as VARCHAR) + '.' + cast(year(sub_clld.DocumentDate) as VARCHAR) = cast(month(main_clld.DocumentDate) as VARCHAR) + '.' + cast(year(main_clld.DocumentDate) as VARCHAR) AND sub_clld.Quantity > 0 AND sub_clld.GCRecord IS NULL) THEN 1 ELSE 0 END AS IsPromo
FROM CisLinkLoadedData AS main_clld
WHERE main_clld.Quantity > 0 AND main_clld.GCRecord IS NULL) AS bad_query
GROUP BY
Distributor,
Network,
Product,
cast(month(DocumentDate) as VARCHAR) + '.' + cast(year(DocumentDate) as VARCHAR),
IsPromo;
What is possible to do in such case? By the way, if you can do result table with another structure like that (Distributor, Network, Product, MonthYear, RegularWeight, PromoWeight) - it's even better. This is what I tried initially, but failed.
I use Microsoft SQL Server.
Rather than a correlated subquery, you could use a windowed function to retrieve the maximum price per group (each group is defined by the partition by clause):
MAX(main_clld.AmountCP / NULLIF(main_clld.Quantity, 0))
OVER(PARTITION BY main_clld.Distributor, main_clld.Network,
main_clld.Product, EOMONTH(main_clld.DocumentDate))
I think your full query would end up something like:
SELECT
Distributor,
Network,
Product,
MonthYear,
SUM(Weight) AS MonthlyWeight,
IsPromo
FROM (SELECT
main_clld.Distributor,
main_clld.Network,
main_clld.Product,
main_clld.DocumentDate,
main_clld.Weight,
main_clld.Quantity,
main_clld.AmountCP,
CAST(MONTH(DocumentDate) AS VARCHAR(2)) + '.' + cast(year(DocumentDate) as VARCHAR(2)) AS MonthYear,
CASE WHEN (main_clld.AmountCP / main_clld.Quantity) < MAX(main_clld.AmountCP / NULLIF(main_clld.Quantity, 0))
OVER(PARTITION BY main_clld.Distributor, main_clld.Network,
main_clld.Product, EOMONTH(main_clld.DocumentDate))
THEN 1 ELSE 0 END AS IsPromo
FROM CisLinkLoadedData AS main_clld
WHERE main_clld.Quantity > 0
AND main_clld.GCRecord IS NULL
) AS bad_query
GROUP BY
Distributor,
Network,
Product,
MonthYear,
IsPromo;

Add a leading zero to months less than 10 and trim 4 digit years to 2 digits

I need to print manipulate month and string which I fetch from a table and display in the format like '12/20', '11/20', 09/20'
For this I need to trim the last 2 digits from year and also a leading zero to months which are less than 10.
SELECT
CAST(MONTH(O.AddDate) AS VARCHAR(2)) + '/' + CAST(YEAR(O.AddDate) AS VARCHAR(4)) AS TimeStamp
FROM
[Order] O
WHERE
O.CountryCode = 9009
GROUP BY
CAST(MONTH(O.AddDate) AS VARCHAR(2)) + '/' + CAST(YEAR(O.AddDate) AS VARCHAR(4))
This provided output in the format of '10/2020', '8/2020' but I require it to be like '08/20', '10/20'
You can use format():
select format(O.AddDate, 'MM/yy') as timestamp
from [Order] o
group by format(O.AddDate, 'MM/yy')
Obviously that's not your entire query; otherwise, if you have no aggregation function in the select clause, you can use select distinct instead of group by.
IMHO formatting dates is a presentation layer concern. SQL Server should provide the data and presentation layer should show the data to user as required. So I would write the query like this:
SELECT DISTINCT
YEAR(O.AddDate) AS AddDateYear, MONTH(O.AddDate) AS AddDateMonth
FROM
[Order] O
WHERE
O.CountryCode = 9009
Or like this:
SELECT
YEAR(O.AddDate) AS AddDateYear, MONTH(O.AddDate) AS AddDateMonth
FROM
[Order] O
WHERE
O.CountryCode = 9009
GROUP BY
YEAR(O.AddDate), MONTH(O.AddDate)
However, if you insist in formatting the date like you requested, then here is the query:
SELECT
RIGHT('0' + CAST(T.AddDateMonth AS varchar(2)), 2) + '/'
+ RIGHT(CAST(T.AddDateYear) AS varchar(4), 2) AS Y2kVulnerableTimestamp
FROM
(
SELECT DISTINCT
YEAR(O.AddDate) AS AddDateYear, MONTH(O.AddDate) AS AddDateMonth
FROM
[Order] O
WHERE
O.CountryCode = 9009
) T

Join two tables using a loop and SUM

I have a table (distance_travelled) with the columns
Primary Key | VehicleName | StartDate | Enddate | Total Distance
another table called Idling with columns
Vehicle Name | Duration | Timestamp
I have taken steps to get far but best way to ask the question is from scratch
i want the output to be the following table with columns
VehicleName | StartDate | EndDate | TotalDistance | Duration (sum of durations between each startDate and enddate
A CROSS APPLY may be a good fit here.
However, I get 18:15 for ID 2 (the sum of 8:15 and 10:00). Perhaps a typo/error in the original question, or additional logic is required.
I should note that the hours CAN exceed 24 just in case it spans multiple days.
Select A.*
,Duration = Format(IsNull(B.Seconds,0)/3600 ,'00') -- Hours 00 - 99
+Format(IsNull(B.Seconds,0)%3600/60,':00') -- Minutes
--+Format(IsNull(B.Seconds,0)%60 ,':00') -- Seconds
From Distance_Travelled A
Cross Apply (
Select Seconds = sum(DateDiff(SECOND,'1900-01-01',Duration))
From Idling
Where VehicleName = A.VehicleName
and TimeStamp between A.StartDate and A.EndDate
) B
Returns
Kind of nasty but you get the idea:
select
dt.id,
dt.VehicleName,
dt.StartDate,
dt.EndDate,
dt.Total_Distance,
substring(cast(convert(time,dateadd(millisecond,sum(datediff(millisecond,0,cast([Duration] as datetime))),0),108) as varchar),0,9) [Duration],
case when substring(cast(convert(time,dateadd(millisecond,sum(datediff(millisecond,0,cast([Duration] as datetime))),0),108) as varchar),0,9) is null then
'no duration...'
else
'sum between ' + convert(varchar, dt.StartDate, 108) + ' and ' + convert(varchar, dt.EndDate, 108)
end as [Duration]
from
distance_travelled dt
left join idling i on
dt.vehiclename = i.VehicleName and
i.TimeStamp between dt.StartDate and dt.EndDate
group by
dt.id,
dt.VehicleName,
dt.StartDate,
dt.EndDate,
dt.Total_Distance

SQL - Value difference between specific rows

My query is as follows
SELECT
LEFT(TimePeriod,6) Period, -- string field with YYYYMMDD
SUM(Value) Value
FROM
f_Trans_GL
WHERE
Account = 228
GROUP BY
TimePeriod
And it returns
Period Value
---------------
201412 80
201501 20
201502 30
201506 50
201509 100
201509 100
I'd like to know the Value difference between rows where the period is 1 month apart. The calculation being [value period] - [value period-1].
The desired output being;
Period Value Calculated
-----------------------------------
201412 80 80 - null = 80
201501 20 20 - 80 = -60
201502 30 30 - 20 = 10
201506 50 50 - null = 50
201509 100 (100 + 100) - null = 200
This illustrates a second challenge, as the period needs to be evaluated if the year changes (the difference between 201501 and 201412 is one month).
And the third challenge being a duplicate Period (201509), in which case the sum of that period needs to be evaluated.
Any indicators on where to begin, if this is possible, would be great!
Thanks in advance
===============================
After I accepted the answer, I tailored this a little to suit my needs, the end result is:
WITH cte
AS (SELECT
ISNULL(CAST(TransactionID AS nvarchar), '_nullTransactionId_') + ISNULL(Description, '_nullDescription_') + CAST(Account AS nvarchar) + Category + Currency + Entity + Scenario AS UID,
LEFT(TimePeriod, 6) Period,
SUM(Value1) Value1,
CAST(LEFT(TimePeriod, 6) + '01' AS date) ord_date
FROM MyTestTable
GROUP BY LEFT(TimePeriod, 6),
TransactionID,
Description,
Account,
Category,
Currency,
Entity,
Scenario,
TimePeriod)
SELECT
a.UID,
a.Period,
--a.Value1,
ISNULL(a.Value1, 0) - ISNULL(b.Value1, 0) Periodic
FROM cte a
LEFT JOIN cte b
ON a.ord_date = DATEADD(MONTH, 1, b.ord_date)
ORDER BY a.UID
I have to get the new value (Periodic) for each UID. This UID must be determined as done here because the PK on the table won't work.
But the issue is that this will return many more rows than I actually have to begin with in my table. If I don't add a GROUP BY and ORDER by UID (as done above), I can tell that the first result for each combination of UID and Period is actually correct, the subsequent rows for that combination, are not.
I'm not sure where to look for a solution, my guess is that the UID is the issue here, and that it will somehow iterate over the field... any direction appreciated.
As pointed by other, first mistake is in Group by you need to Left(timeperiod, 6) instead of timeperiod.
For remaining calculation try something like this
;WITH cte
AS (SELECT LEFT(timeperiod, 6) Period,
Sum(value) Value,
Cast(LEFT(timeperiod, 6) + '01' AS DATE) ord_date
FROM f_trans_gl
WHERE account = 228
GROUP BY LEFT(timeperiod, 6))
SELECT a.period,
a.value,
a.value - Isnull(b.value, 0)
FROM cte a
LEFT JOIN cte b
ON a.ord_date = Dateadd(month, 1, b.ord_date)
If you are using SQL SERVER 2012 then this can be easily done using LAG analytic function
Using a derived table, you can join the data to itself to find rows that are in the preceding period. I have converted your Period to a Date value so you can use SQL Server's dateadd function to check for rows in the previous month:
;WITH cte AS
(
SELECT
LEFT(TimePeriod,6) Period, -- string field with YYYYMMDD
CAST(TimePeriod + '01' AS DATE) PeriodDate
SUM(Value) Value
FROM f_Trans_GL
WHERE Account = 228
GROUP BY LEFT(TimePeriod,6)
)
SELECT c1.Period,
c1.Value,
c1.Value - ISNULL(c2.Value,0) AS Calculation
FROM cte c1
LEFT JOIN cte c2
ON c1.PeriodDate = DATEADD(m,1,c2.PeriodDate)
Without cte, you can also try something like this
SELECT A.Period,A.Value,A.Value-ISNULL(B.Value) Calculated
FROM
(
SELECT LEFT(TimePeriod,6) Period
DATEADD(M,-1,(CONVERT(date,LEFT(TimePeriod,6)+'01'))) PeriodDatePrev,SUM(Value) Value
FROM f_Trans_GL
WHERE Account = 228
GROUP BY LEFT(TimePeriod,6)
) AS A
LEFT OUTER JOIN
(
SELECT LEFT(TimePeriod,6) Period
(CONVERT(date,LEFT(TimePeriod,6)+'01')) PeriodDate,SUM(Value) Value
FROM f_Trans_GL
WHERE Account = 228
GROUP BY LEFT(TimePeriod,6)
) AS B
ON (A.PeriodDatePrev = B.PeriodDate)
ORDER BY 1

Use sum case sql in Crystal command to show sales in this year and last year?

I am using the sql code below in the Crystal "Command" to display current years sales units and dollars (all closed sales versus sales closed using a discount). I need to add the last years unit sales qty. Does anyone have any idea the nest way to do this? Thanks to anyone who has any ideas.
Code:
SELECT
convert(char(4),datepart(yy,m.close_dt)) +
right('00' + convert(varchar,datepart(m,m.close_dt)),2) AS SortMonth,
replace(right(convert(varchar(11), m.close_dt, 106), 8), ' ', '-') AS DisplayMonth,
sum(case when lt.um_ch177= 'BAWRM' then 1 else 0 end) as Close_Units_Disc ,
sum(case when lt.um_ch177= 'BAWRM' then m.tot_ln_amt else 0 end) as Close_Dollars_Disc,
sum(case when m.close_dt >= '{?Date1}'
and m.close_dt <= '{?Date2}' then 1 else 0 end) as Close_Units_All,
sum(case when m.close_dt >= '{?Date1}'
and m.close_dt <= '{?Date2}' then tot_ln_amt else 0 end) as Close_Dollars_All
FROM
pro2sql.lt_master m WITH (NOLOCK)
LEFT OUTER JOIN pro2sql.ltuch_master lt WITH (NOLOCK) ON m.lt_acnt=lt.lt_acnt
WHERE
m.stage = 60
and m.loan_purpose <> 7
and m.app_number <> 0
and m.brch_entry {?BranchList}
and m.close_dt >= '{?Date1}'
and m.close_dt <'{?Date2}'
Group by
convert(char(4),datepart(yy,m.close_dt)) + right('00' + convert(varchar,datepart(m,m.close_dt)),2)
,replace(right(convert(varchar(11), m.close_dt, 106), 8), ' ', '-')
I can't upload a pic - not sure if this is going to be a jumble but here is the output - the last two columns are what I need to add:
DisplayMonth Close_Units_Disc Close_Dollars_Disc Close_Units_All Close_Dollars_All %Units %Dollars DisplayMonth LY CloseUnits All
Feb-2014 115 $48,919,800 190 $83,942,650 61% 58% Feb-2013
Mar-2014 202 $91,077,780 238 $109,300,903 85% 83% Mar-2013
Apr-2014 219 $89,157,481 238 $95,892,509 92% 93% Apr-2013
"Enterprise system" is not an adequate answer. What type of database is this? Oracle? Microsoft (and which version if its MSS)? Something else?
That may influence what solutions are possible, or at least the syntax
You have at least two options off the top of my head though:
One:
Expand the date range to pull in the previous year's data;
modify all of your sum case statements to include the current year only.
add an additional sum case with appropriate date range conditions for the previous year's units.
Two:
duplicate your query as a subquery for the previous year and join to it
e.g:
LEFT OUTER JOIN (
replace(right(convert(varchar(11), m.close_dt, 106), 8), ' ', '-') AS DisplayMonth,
SELECT sum(case when lt.um_ch177= 'BAWRM' then 1 else 0 end) as Close_Units_Disc
FROM pro2sql.lt_master m WITH (NOLOCK)
LEFT OUTER JOIN pro2sql.ltuch_master lt WITH (NOLOCK) ON m.lt_acnt=lt.lt_acnt
WHERE m.stage = 60 and m.loan_purpose <> 7 and m.app_number <> 0
and m.brch_entry {?BranchList}
and m.close_dt >= '{?Date3}' --//#last year's start date
and m.close_dt <'{?Date4}'` --//#last year's end date
Group by
convert(char(4),datepart(yy,m.close_dt)) +
right('00' + convert(varchar,datepart(m,m.close_dt)),2),
replace(right(convert(varchar(11), m.close_dt, 106), 8), ' ', '-')
) as lastYear
on replace(right(convert(varchar(11), m.close_dt, 106), 8), ' ', '-') = lastYear.DisplayMonth
If you go with one of these, you may want to consult with a DBA at your company to see which would be more efficient... I don't know if running a query with a larger result is more intensive than running nearly identical queries twice, and that too may change form one architecture to another (or even just with different environment variables/parameters configured on the server)