avg of count incorrect results - sql

I need to get the Brand and Type of all consoles that get repaired less than average, console types that haven't been repaired have to count towards this average, as well.
So I get the brand and type from the
console table.
And I join this to the Items table(artikel table in this DB).
Then I left join the items to the Repairs table,because I also need the console types that haven't been repaired, not sure if this is correct.
So now to get the amount of repairs per console type I did a count on the repaired_items_id in the Repairs table (repareerd_artikel_id in the picture),and I grouped it by the same column, and then I took the average of that count.
This is my syntax, I also tried different group by combinations but the results are always wrong.
select merk,type from console c join artikel a on
a.CONSOLE_ID=c.CONSOLE_ID left join REPARATIE r on
REPAREERD_ARTIKEL_ID=a.ARTIKEL_ID group by MERK,TYPE
HAVING (select avg(A.rcount) from (select
count(repareerd_artikel_id) AS rcount from REPARATIE group by
REPAREERD_ARTIKEL_ID) A) < (select avg(A.rcount) from (select
count(repareerd_artikel_id) AS rcount from REPARATIE group by
REPAREERD_ARTIKEL_ID) A)
And then I also tried starting with a count instead.
HAVING count(repareerd_artikel_id)< (select avg(A.rcount) from
(select count(repareerd_artikel_id) AS rcount from REPARATIE group by
REPAREERD_ARTIKEL_ID) A)
I have no idea what to do anymore now so any help would be much appreciated.

Using your query (and tbh not checking your images, as they are a pain to navigate to from SO) I think you could do something like this, breaking down the problem into three stages:
Work out the number of repairs for each console, including zeroes;
Work out the average number of repairs across all consoles;
List any repairs that are under the average.
WITH AllRepairs AS (
SELECT
merk,
[type],
ISNULL(COUNT(r.repareerd_artikel_id), 0) AS repairs
FROM
console c
INNER JOIN artikel a ON a.CONSOLE_ID = c.CONSOLE_ID
LEFT JOIN REPARATIE r ON r.REPAREERD_ARTIKEL_ID = a.ARTIKEL_ID
GROUP BY
merk,
[type]),
AverageRepairs AS (
SELECT
AVG(repairs) AS average_repairs
FROM
AllRepairs)
SELECT
a.*
FROM
AllRepairs a
CROSS JOIN AverageRepairs ar
WHERE
a.repairs < ar.average_repairs
ORDER BY
a.repairs;
You might want to worry about comparing integers to a decimal? For example, if your average number of repairs is 2.9 then only anything under 2 would count as below average. I think that's probably what you want?

The way I am viewing your info it looks like you do not have any repairs for the items in your items table. So, I added 2 repairs to one item and 1 repair to another. In my query, I produce an average across all 11 Merk/Types of 1.5. I then compare this to the number of repairs for each. The record with 2 repairs get presented in the result.
Create Table #console
(
Console_Id Int,
Merk Varchar(25),
Type VarChar(25),
Kleur VarChar(10),
Jaar_Uitgave Int,
Maat VarChar(10)
)
Insert Into #console Values
(1,'Sony 1','PS4 Slim','Wit',2016,'Slim'),
(2,'Microsoft','XBox','Beige',2004,'Port'),
(3,'Microsoft','XBox 360','Zwart',2011,'Pro'),
(4,'Microsoft','XBox One','Wit',2014,'Pro'),
(5,'Microsoft','XBox One X','Wit',2017,'Pro'),
(6,'Nintendo','NES Classic Edition','Wit',2016,'XL'),
(7,'Nintendo','Switch','Wit',2017,'XL'),
(8,'Nintendo','WII','Wit',2011,'Slim'),
(9,'Nintendo','WII Mini','Wit',2015,'XL'),
(10,'Nintendo','WII U','Wit',2013,'Slim'),
(11,'Sony','PS3','Wit',2013,'Port')
Create Table #Items
(
Artikel_ID Int,
BarCode VarChar(20),
Prijs Float,
Prijs_Per_D Float,
Spel_Of_Console VarChar(25),
Spel_ID Int,
Console_Id Int
)
Insert Into #Items Values
(301,'10000008',300.00,11.00,'Console',Null,3),
(302,'10000017',400.00,15.00,'Console',Null,4),
(303,'10000026',270.00,9.00,'Console',Null,9),
(304,'10000035',200.00,5.00,'Console',Null,6),
(305,'10000044',200.00,5.00,'Console',Null,11),
(306,'10000053',300.00,11.00,'Console',Null,12),
(307,'10000023',60.00,2.00,'Spel',15,Null),
(308,'10000242',36.00,2.00,'Spel',16,Null),
(309,'10000278',35.00,2.00,'Spel',21,Null),
(310,'10000107',66.00,4.00,'Spel',36,Null),
(311,'10000215',45.00,3.00,'Spel',40,Null)
Create Table #Repairs
(
Medewerker_Id Int,
Repareerd_Artikel_Id Int,
Schadenummer Int,
Huurovereenkomst_Id Int,
datum_Gereed DateTime,
Kosten Float,
Reparatiestatus VarChar(25)
)
Insert Into #Repairs Values
(1,259,7,12,'2017-08-03 00:00:00',112.00,'GEREED'),
(2,260,9,14,'2016-09-29 00:00:00',84.00,'GEREED'),
(3,288,19,28,'2017-04-09 00:00:00',96.00,'GEREED'),
(4,292,21,30,'2018-01-27 00:00:00',110.00,'GEREED'),
(5,283,16,24,'2015-12-29 00:00:00',103.00,'GEREED'),
(6,245,1,2,'2017-01-31 00:00:00',160.00,'GEREED'),
(7,245,2,3,'2018-01-18 00:00:00',120.00,'GEREED'),
(8,275,11,19,'2016-04-15 00:00:00',75.00,'GEREED'),
(9,276,12,20,'2015-08-25 00:00:00',174.00,'GEREED'),
(10,283,15,23,'2014-06-10 00:00:00',74.00,'GEREED'),
(11,297,21,34,'2014-07-17 00:00:00',96.00,'GEREED')
Insert Into #Repairs Values
(14,305,21,34,'2014-07-25 00:00:00',96.00,'GEREED'),
(12,301,21,34,'2014-07-17 00:00:00',96.00,'GEREED'),
(13,301,21,34,'2014-07-25 00:00:00',96.00,'GEREED')
Query
;With cte As
(
select
c.Merk, c.Type,
Count(r.REPAREERD_ARTIKEL_ID) As cnt
from
#console c Left join
#Items a on a.CONSOLE_ID=c.CONSOLE_ID left join
#Repairs r on r.REPAREERD_ARTIKEL_ID=a.ARTIKEL_ID
group by
c.merk, c.type
)
Select
*,
(Select Count(*) As totrecs From cte) As cntRecs ,
(Select avg(Cast(cte.cnt As Float)) As avgrecs From cte Where cte.cnt > 0) as avgrecs
From
cte
Where cte.cnt > (Select avg(Cast(cte.cnt As Float)) As avgrecs From cte Where cte.cnt > 0)
Result:
Merk Type cnt cntRecs avgrecs
Microsoft XBox 360 2 11 1.5

Related

SQL - Get the sum of several groups of records

DESIRED RESULT
Get the hours SUM of all [Hours] including only a single result from each [DevelopmentID] where [Revision] is highest value
e.g SUM 1, 2, 3, 5, 6 (Result should be 22.00)
I'm stuck trying to get the appropriate grouping.
DECLARE #CompanyID INT = 1
SELECT
SUM([s].[Hours]) AS [Hours]
FROM
[dbo].[tblDev] [d] WITH (NOLOCK)
JOIN
[dbo].[tblSpec] [s] WITH (NOLOCK) ON [d].[DevID] = [s].[DevID]
WHERE
[s].[Revision] = (
SELECT MAX([s2].[Revision]) FROM [tblSpec] [s2]
)
GROUP BY
[s].[Hours]
use row_number() to identify the latest revision
SELECT SUM([Hours])
FROM (
SELECT *, R = ROW_NUMBER() OVER (PARTITION BY d.DevID
ORDER BY s.Revision)
FROM [dbo].[tblDev] d
JOIN [dbo].[tblSpec] s
ON d.[DevID] = s.[DevID]
) d
WHERE R = 1
If you want one row per DevId, then that should be in the GROUP BY (and presumably in the SELECT as well):
SELECT s.DevId, SUM(s.Hours) as hours
FROM [dbo].[tblDev] d JOIN
[dbo].[tblSpec] s
ON [d].[DevID] = [s].[DevID]
WHERE s.Revision = (SELECT MAX(s2.Revision) FROM tblSpec s2)
GROUP BY s.DevId;
Also, don't use WITH NOLOCK unless you really know what you are doing -- and I'm guessing you do not. It is basically a license that says: "You can get me data even if it is not 100% accurate."
I would also dispense with all the square braces. They just make the query harder to write and to read.

Calculating current consecutive days from a table

I have what seems to be a common business request but I can't find no clear solution. I have a daily report (amongst many) that gets generated based on failed criteria and gets saved to a table. Each report has a type id tied to it to signify which report it is, and there is an import event id that signifies the day the imports came in (a date column is added for extra clarification). I've added a sqlfiddle to see the basic schema of the table (renamed for privacy issues).
http://www.sqlfiddle.com/#!3/81945/8
All reports currently generated are working fine, so nothing needs to be modified on the table. However, for one report (type 11), not only I need pull the invoices that showed up today, I also need to add one column that totals the amount of consecutive days from date of run for that invoice (including current day). The result should look like the following, based on the schema provided:
INVOICE MESSAGE EVENT_DATE CONSECUTIVE_DAYS_ON_REPORT
12345 Yes July, 30 2013 6
54355 Yes July, 30 2013 2
644644 Yes July, 30 2013 4
I only need the latest consecutive days, not any other set that may show up. I've tried to run self joins to no avail, and my last attempt is also listed as part of the sqlfiddle file, to no avail. Any suggestions or ideas? I'm quite stuck at the moment.
FYI: I am working in SQL Server 2000! I have seen a lot of neat tricks that have come out in 2005 and 2008, but I can't access them.
Your help is greatly appreciated!
Something like this? http://www.sqlfiddle.com/#!3/81945/14
SELECT
[final].*,
[last].total_rows
FROM
tblEventInfo AS [final]
INNER JOIN
(
SELECT
[first_of_last].type_id,
[first_of_last].invoice,
MAX([all_of_last].event_date) AS event_date,
COUNT(*) AS total_rows
FROM
(
SELECT
[current].type_id,
[current].invoice,
MAX([current].event_date) AS event_date
FROM
tblEventInfo AS [current]
LEFT JOIN
tblEventInfo AS [previous]
ON [previous].type_id = [current].type_id
AND [previous].invoice = [current].invoice
AND [previous].event_date = [current].event_date-1
WHERE
[current].type_id = 11
AND [previous].type_id IS NULL
GROUP BY
[current].type_id,
[current].invoice
)
AS [first_of_last]
INNER JOIN
tblEventInfo AS [all_of_last]
ON [all_of_last].type_id = [first_of_last].type_id
AND [all_of_last].invoice = [first_of_last].invoice
AND [all_of_last].event_date >= [first_of_last].event_date
GROUP BY
[first_of_last].type_id,
[first_of_last].invoice
)
AS [last]
ON [last].type_id = [final].type_id
AND [last].invoice = [final].invoice
AND [last].event_date = [final].event_date
The inner most query looks up the starting record of the last block of consecutive records.
Then that joins on to all the records in that block of consecutive records, giving the final date and the count of rows (consecutive days).
Then that joins on to the row for the last day to get the message, etc.
Make sure that in reality you have an index on (type_id, invoice, event_date).
You have multiple problems. Tackle them separately and build up.
Problems:
1) Identifying consecutive ranges: subtract the row_number from the range column and group by the result
2) No ROW_NUMBER() functions in SQL 2000: Fake it with a correlated subquery.
3) You actually want DENSE_RANK() instead of ROW_NUMBER: Make a list of unique dates first.
Solutions:
3)
SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date
2)
SELECT t2.invoice,t2.event_date,t2.id,
DATEDIFF(day,(SELECT COUNT(DISTINCT event_date) FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t1 WHERE t2.invoice = t1.invoice AND t2.event_date > t1.event_date),t2.event_date) grp
FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t2
ORDER BY invoice,grp,event_date
1)
SELECT
t3.invoice AS INVOICE,
MAX(t3.event_date) AS EVENT_DATE,
COUNT(t3.event_date) AS CONSECUTIVE_DAYS_ON_REPORT
FROM (
SELECT t2.invoice,t2.event_date,t2.id,
DATEDIFF(day,(SELECT COUNT(DISTINCT event_date) FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t1 WHERE t2.invoice = t1.invoice AND t2.id > t1.id),t2.event_date) grp
FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t2
) t3
GROUP BY t3.invoice,t3.grp
The rest of your question is a little ambiguous. If two ranges are of equal length, do you want both or just the most recent? Should the output MESSAGE be 'Yes' if any message = 'Yes' or only if the most recent message = 'Yes'?
This should give you enough of a breadcrumb though
I had a similar requirement not long ago getting a "Top 5" ranking with a consecutive number of periods in Top 5. The only solution I found was to do it in a cursor. The cursor has a date = #daybefore and inside the cursor if your data does not match quit the loop, otherwise set #daybefore = datediff(dd, -1, #daybefore).
Let me know if you want an example. There just seem to be a large number of enthusiasts, who hit downvote when they see the word "cursor" even if they don't have a better solution...
Here, try a scalar function like this:
CREATE FUNCTION ConsequtiveDays
(
#invoice bigint, #date datetime
)
RETURNS int
AS
BEGIN
DECLARE #ct int = 0, #Count_Date datetime, #Last_Date datetime
SELECT #Last_Date = #date
DECLARE counter CURSOR LOCAL FAST_FORWARD
FOR
SELECT event_date FROM tblEventInfo
WHERE invoice = #invoice
ORDER BY event_date DESC
FETCH NEXT FROM counter
INTO #Count_Date
WHILE ##FETCH_STATUS = 0 AND DATEDIFF(dd,#Last_Date,#Count_Date) < 2
BEGIN
#ct = #ct + 1
END
CLOSE counter
DEALLOCATE counter
RETURN #ct
END
GO

Find Segment with Longest Stay Per Booking

We have a number of bookings and one of the requirements is that we display the Final Destination for a booking based on its segments. Our business has defined the Final Destination as that in which we have the longest stay. And Origin being the first departure point.
Please note this is not the segments with the Longest Travel time i.e. Datediff(minute, DepartDate, ArrivalDate) This is requesting the one with the Longest gap between segments.
This is a simplified version of the tables:
Create Table Segments
(
BookingID int,
SegNum int,
DepartureCity varchar(100),
DepartDate datetime,
ArrivalCity varchar(100),
ArrivalDate datetime
);
Create Table Bookings
(
BookingID int identity(1,1),
Locator varchar(10)
);
Insert into Segments values (1,2,'BRU','2010-03-06 10:40','FIH','2010-03-06 20:20:00')
Insert into Segments values (1,4,'FIH','2010-03-13 21:50:00','BRU', '2010-03-14 07:25:00')
Insert into Segments values (2,2,'BOD','2010-02-10 06:50:00','AMS','2010-02-10 08:50:00')
Insert into Segments values (2,3,'AMS','2010-02-10 10:40:00','EBB','2010-02-10 20:40:00')
Insert into Segments values (2,4,'EBB','2010-02-28 22:55:00','AMS','2010-03-01 05:35:00')
Insert into Segments values (2,5,'AMS','2010-03-01 10:25:00','BOD','2010-03-01 12:15:00')
insert into Segments values (3,2,'BRU','2010-03-09 12:10:00','IAD','2010-03-09 14:46:00')
Insert into Segments Values (3,3,'IAD','2010-03-13 17:57:00','BRU','2010-03-14 07:15:00')
insert into segments values (4,2,'BRU','2010-07-27','ADD','2010-07-28')
insert into segments values (4,4,'ADD','2010-07-28','LUN','2010-07-28')
insert into segments values (4,5,'LUN','2010-08-23','ADD','2010-08-23')
insert into segments values (4,6,'ADD','2010-08-23','BRU','2010-08-24')
Insert into Bookings values('5MVL7J')
Insert into Bookings values ('Y2IMXQ')
insert into bookings values ('YCBL5C')
Insert into bookings values ('X7THJ6')
I have created a SQL Fiddle with real data here:
SQL Fiddle Example
I have tried to do the following, however this doesn't appear to be correct.
SELECT Locator, fd.*
FROM Bookings ob
OUTER APPLY
(
SELECT Top 1 DepartureCity, ArrivalCity
from
(
SELECT DISTINCT
seg.segnum ,
seg.DepartureCity ,
seg.DepartDate ,
seg.ArrivalCity ,
seg.ArrivalDate,
(SELECT
DISTINCT
DATEDIFF(MINUTE , seg.ArrivalDate , s2.DepartDate)
FROM Segments s2
WHERE s2.BookingID = seg.BookingID AND s2.segnum = seg.segnum + 1) 'LengthOfStay'
FROM Bookings b(NOLOCK)
INNER JOIN Segments seg (NOLOCK) ON seg.bookingid = b.bookingid
WHERE b.Locator = ob.locator
) a
Order by a.lengthofstay desc
)
FD
The results I expect are:
Locator Origin Destination
5MVL7J BRU FIH
Y2IMXQ BOD EBB
YCBL5C BRU IAD
X7THJ6 BRU LUN
I get the feeling that a CTE would be the best approach, however my attempts do this so far failed miserably. Any help would be greatly appreciated.
I have managed to get the following query working but it only works for one at a time due to the top one, but I'm not sure how to tweak it:
WITH CTE AS
(
SELECT distinct s.DepartureCity, s.DepartDate, s.ArrivalCity, s.ArrivalDate, b.Locator , ROW_NUMBER() OVER (PARTITION BY b.Locator ORDER BY SegNum ASC) RN
FROM Segments s
JOIN bookings b ON s.bookingid = b.BookingID
)
SELECT C.Locator, c.DepartureCity, a.ArrivalCity
FROM
(
SELECT TOP 1 C.Locator, c.ArrivalCity, c1.DepartureCity, DATEDIFF(MINUTE,c.ArrivalDate, c1.DepartDate) 'ddiff'
FROM CTE c
JOIN cte c1 ON c1.Locator = C.Locator AND c1.rn = c.rn + 1
ORDER BY ddiff DESC
) a
JOIN CTE c ON C.Locator = a.Locator
WHERE c.rn = 1
You can try something like this:
;WITH CTE_Start AS
(
--Ordering of segments to eliminate gaps
SELECT *, ROW_NUMBER() OVER (PARTITION BY BookingID ORDER BY SegNum) RN
FROM dbo.Segments
)
, RCTE_Stay AS
(
--recursive CTE to calculate stay between segments
SELECT *, 0 AS Stay FROM CTE_Start s WHERE RN = 1
UNION ALL
SELECT sNext.*, DATEDIFF(Mi, s.ArrivalDate, sNext.DepartDate)
FROM CTE_Start sNext
INNER JOIN RCTE_Stay s ON s.RN + 1 = sNext.RN AND s.BookingID = sNext.BookingID
)
, CTE_Final AS
(
--Search for max(stay) for each bookingID
SELECT *, ROW_NUMBER() OVER (PARTITION BY BookingID ORDER BY Stay DESC) AS RN_Stay
FROM RCTE_Stay
)
--join Start and Final on RN=1 to find origin and departure
SELECT b.Locator, s.DepartureCity AS Origin, f.DepartureCity AS Destination
FROM CTE_Final f
INNER JOIN CTE_Start s ON f.BookingID = s.BookingID
INNER JOIN dbo.Bookings b ON b.BookingID = f.BookingID
WHERE s.RN = 1 AND f.RN_Stay = 1
SQLFiddle DEMO
You can use the OUTER APPLY + TOP operators to find the next values SegNum. After finding the gap between segments are used MIN/MAX aggregate functions with OVER clause as conditions in the CASE expression
;WITH cte AS
(
SELECT seg.BookingID,
CASE WHEN MIN(seg.segNum) OVER(PARTITION BY seg.BookingID) = seg.segNum
THEN seg.DepartureCity END AS Origin,
CASE WHEN MAX(DATEDIFF(MINUTE, seg.ArrivalDate, o.DepartDate)) OVER(PARTITION BY seg.BookingID)
= DATEDIFF(MINUTE, seg.ArrivalDate, o.DepartDate)
THEN o.DepartureCity END AS Destination
FROM Segments seg (NOLOCK)
OUTER APPLY (
SELECT TOP 1 seg2.DepartDate, seg2.DepartureCity
FROM Segments seg2
WHERE seg.BookingID = seg2.BookingID
AND seg.SegNum < seg2.SegNum
ORDER BY seg2.SegNum ASC
) o
)
SELECT b.Locator, MAX(c.Origin) AS Origin, MAX(c.Destination) AS Destination
FROM cte c JOIN Bookings b ON c.BookingID = b.BookingID
GROUP BY b.Locator
See demo on SQLFiddle
The statement below:
;WITH DataSource AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY BookingID ORDER BY DATEDIFF(SS,DepartDate,ArrivalDate) DESC) AS Row
,Segments.BookingID
,Segments.SegNum
,Segments.DepartureCity
,Segments.DepartDate
,Segments.ArrivalCity
,Segments.ArrivalDate
,DATEDIFF(SS,DepartDate,ArrivalDate) AS DiffInSeconds
FROM Segments
)
SELECT *
FROM DataSource DS
INNER JOIN Bookings B
ON DS.[BookingID] = B.[BookingID]
Will give the following output:
So, adding the following clause to the above statement:
WHERE Row = 1
will give you what you need.
Few important things:
As you can see from the screenshot below, there are two records with same difference in second. If you want to show both of them (or all of them if there are), instead ROW_NUMBER function use RANK function.
The return type of DATEDIFF is INT. So, there is limitation for seconds max deference value. It is as follows:
If the return value is out of range for int (-2,147,483,648 to
+2,147,483,647), an error is returned. For millisecond, the maximum difference between startdate and enddate is 24 days, 20 hours, 31
minutes and 23.647 seconds. For second, the maximum difference is 68
years.

Multiple Running Totals with Group By

I am struggling to find a good way to run running totals with a group by in it, or the equivalent. The below cursor based running total works on a complete table, but I would like to expand this to add a "Client" dimension. So I would get running totals as the below creates but for each company (ie Company A, Company B, Company C, etc.) in one table
CREATE TABLE test (tag int, Checks float, AVG_COST float, Check_total float, Check_amount float, Amount_total float, RunningTotal_Check float,
RunningTotal_Amount float)
DECLARE #tag int,
#Checks float,
#AVG_COST float,
#check_total float,
#Check_amount float,
#amount_total float,
#RunningTotal_Check float ,
#RunningTotal_Check_PCT float,
#RunningTotal_Amount float
SET #RunningTotal_Check = 0
SET #RunningTotal_Check_PCT = 0
SET #RunningTotal_Amount = 0
DECLARE aa_cursor CURSOR fast_forward
FOR
SELECT tag, Checks, AVG_COST, check_total, check_amount, amount_total
FROM test_3
OPEN aa_cursor
FETCH NEXT FROM aa_cursor INTO #tag, #Checks, #AVG_COST, #check_total, #Check_amount, #amount_total
WHILE ##FETCH_STATUS = 0
BEGIN
SET #RunningTotal_CHeck = #RunningTotal_CHeck + #checks
set #RunningTotal_Amount = #RunningTotal_Amount + #Check_amount
INSERT test VALUES (#tag, #Checks, #AVG_COST, #check_total, #Check_amount, #amount_total, #RunningTotal_check, #RunningTotal_Amount )
FETCH NEXT FROM aa_cursor INTO #tag, #Checks, #AVG_COST, #check_total, #Check_amount, #amount_total
END
CLOSE aa_cursor
DEALLOCATE aa_cursor
SELECT *, RunningTotal_Check/Check_total as CHECK_RUN_PCT, round((RunningTotal_Check/Check_total *100),0) as CHECK_PCT_BIN, RunningTotal_Amount/Amount_total as Amount_RUN_PCT, round((RunningTotal_Amount/Amount_total * 100),0) as Amount_PCT_BIN
into test_4
FROM test ORDER BY tag
create clustered index IX_TESTsdsdds3 on test_4(tag)
DROP TABLE test
----------------------------------
I can the the running total for any 1 company but I would like to do it for multiple to produce something like the results below.
CLIENT COUNT Running Total
Company A 1 6.7%
Company A 2 20.0%
Company A 3 40.0%
Company A 4 66.7%
Company A 5 100.0%
Company B 1 3.6%
Company B 2 10.7%
Company B 3 21.4%
Company B 4 35.7%
Company B 5 53.6%
Company B 6 75.0%
Company B 7 100.0%
Company C 1 3.6%
Company C 2 10.7%
Company C 3 21.4%
Company C 4 35.7%
Company C 5 53.6%
Company C 6 75.0%
Company C 7 100.0%
This is finally simple to do in SQL Server 2012, where SUM and COUNT support OVER clauses that contain ORDER BY. Using Cris's #Checks table definition:
SELECT
CompanyID,
count(*) over (
partition by CompanyID
order by Cleared, ID
) as cnt,
str(100.0*sum(Amount) over (
partition by CompanyID
order by Cleared, ID
)/
sum(Amount) over (
partition by CompanyID
),5,1)+'%' as RunningTotalForThisCompany
FROM #Checks;
SQL Fiddle here.
I originally started posting the SQL Server 2012 equivalent (since you didn't mention what version you were using). Steve has done a great job of showing the simplicity of this calculation in the newest version of SQL Server, so I'll focus on a few methods that work on earlier versions of SQL Server (back to 2005).
I'm going to take some liberties with your schema, since I can't figure out what all these #test and #test_3 and #test_4 temporary tables are supposed to represent. How about:
USE tempdb;
GO
CREATE TABLE dbo.Checks
(
Client VARCHAR(32),
CheckDate DATETIME,
Amount DECIMAL(12,2)
);
INSERT dbo.Checks(Client, CheckDate, Amount)
SELECT 'Company A', '20120101', 50
UNION ALL SELECT 'Company A', '20120102', 75
UNION ALL SELECT 'Company A', '20120103', 120
UNION ALL SELECT 'Company A', '20120104', 40
UNION ALL SELECT 'Company B', '20120101', 75
UNION ALL SELECT 'Company B', '20120105', 200
UNION ALL SELECT 'Company B', '20120107', 90;
Expected output in this case:
Client Count Running Total
--------- ----- -------------
Company A 1 17.54
Company A 2 43.86
Company A 3 85.96
Company A 4 100.00
Company B 1 20.55
Company B 2 75.34
Company B 3 100.00
One way:
;WITH gt(Client, Totals) AS
(
SELECT Client, SUM(Amount)
FROM dbo.Checks AS c
GROUP BY Client
), n (Client, Amount, rn) AS
(
SELECT c.Client, c.Amount,
ROW_NUMBER() OVER (PARTITION BY c.Client ORDER BY c.CheckDate)
FROM dbo.Checks AS c
)
SELECT n.Client, [Count] = n.rn,
[Running Total] = CONVERT(DECIMAL(5,2), 100.0*(
SELECT SUM(Amount) FROM n AS n2
WHERE Client = n.Client AND rn <= n.rn)/gt.Totals
)
FROM n INNER JOIN gt ON n.Client = gt.Client
ORDER BY n.Client, n.rn;
A slightly faster alternative - more reads but shorter duration and simpler plan:
;WITH x(Client, CheckDate, rn, rt, gt) AS
(
SELECT Client, CheckDate, rn = ROW_NUMBER() OVER
(PARTITION BY Client ORDER BY CheckDate),
(SELECT SUM(Amount) FROM dbo.Checks WHERE Client = c.Client
AND CheckDate <= c.CheckDate),
(SELECT SUM(Amount) FROM dbo.Checks WHERE Client = c.Client)
FROM dbo.Checks AS c
)
SELECT Client, [Count] = rn,
[Running Total] = CONVERT(DECIMAL(5,2), rt * 100.0/gt)
FROM x
ORDER BY Client, [Count];
While I've offered set-based alternatives here, in my experience I have observed that a cursor is often the fastest supported way to perform running totals. There are other methods such as the quirky update which perform about marginally faster but the result is not guaranteed. The set-based approach where you perform a self-join becomes more and more expensive as the source row counts go up - so what seems to perform okay in testing with a small table, as the table gets larger, the performance goes down.
I have a blog post almost fully prepared that goes through a slightly simpler performance comparison of various running totals approaches. It is simpler because it is not grouped and it only shows the totals, not the running total percentage. I hope to publish this post soon and will try to remember to update this space.
There is also another alternative to consider that doesn't require reading previous rows multiple times. It's a concept Hugo Kornelis describes as "set-based iteration." I don't recall where I first learned this technique, but it makes a lot of sense in some scenarios.
DECLARE #c TABLE
(
Client VARCHAR(32),
CheckDate DATETIME,
Amount DECIMAL(12,2),
rn INT,
rt DECIMAL(15,2)
);
INSERT #c SELECT Client, CheckDate, Amount,
ROW_NUMBER() OVER (PARTITION BY Client
ORDER BY CheckDate), 0
FROM dbo.Checks;
DECLARE #i INT, #m INT;
SELECT #i = 2, #m = MAX(rn) FROM #c;
UPDATE #c SET rt = Amount WHERE rn = 1;
WHILE #i <= #m
BEGIN
UPDATE c SET c.rt = c2.rt + c.Amount
FROM #c AS c
INNER JOIN #c AS c2
ON c.rn = c2.rn + 1
AND c.Client = c2.Client
WHERE c.rn = #i;
SET #i = #i + 1;
END
SELECT Client, [Count] = rn, [Running Total] = CONVERT(
DECIMAL(5,2), rt*100.0 / (SELECT TOP 1 rt FROM #c
WHERE Client = c.Client ORDER BY rn DESC)) FROM #c AS c;
While this does perform a loop, and everyone tells you that loops and cursors are bad, one gain with this method is that once the previous row's running total has been calculated, we only have to look at the previous row instead of summing all prior rows. The other gain is that in most cursor-based solutions you have to go through each client and then each check. In this case, you go through all clients' 1st checks once, then all clients' 2nd checks once. So instead of (client count * avg check count) iterations, we only do (max check count) iterations. This solution doesn't make much sense for the simple running totals example, but for the grouped running totals example it should be tested against the set-based solutions above. Not a chance it will beat Steve's approach, though, if you are on SQL Server 2012.
UPDATE
I've blogged about various running totals approaches here:
http://www.sqlperformance.com/2012/07/t-sql-queries/running-totals
I didn't exactly understand the schema you were pulling from, but here is a quick query using a temp table that shows how to do a running total in a set based operation.
CREATE TABLE #Checks
(
ID int IDENTITY(1,1) PRIMARY KEY
,CompanyID int NOT NULL
,Amount float NOT NULL
,Cleared datetime NOT NULL
)
INSERT INTO #Checks
VALUES
(1,5,'4/1/12')
,(1,5,'4/2/12')
,(1,7,'4/5/12')
,(2,10,'4/3/12')
SELECT Info.ID, Info.CompanyID, Info.Amount, RunningTotal.Total, Info.Cleared
FROM
(
SELECT main.ID, SUM(other.Amount) as Total
FROM
#Checks main
JOIN
#Checks other
ON
main.CompanyID = other.CompanyID
AND
main.Cleared >= other.Cleared
GROUP BY
main.ID) RunningTotal
JOIN
#Checks Info
ON
RunningTotal.ID = Info.ID
DROP TABLE #Checks

Remove duplicates (1 to many) or write a subquery that solves my problem

Referring to the diagram below the records table has unique Records. Each record is updated, via comments through an Update Table. When I join the two I get lots of duplicates.
How to remove duplicates? Group By does not work for me as I have more than 10 fields in select query and some of them are functions.
Write a sub query which pulls the last updates in the Update table for each record that is updated in a particular month. Joining with this sub query will solve my problem.
Thanks!
Edit
Table structure that is of interest is
create table Records(
recordID int,
90more_fields various
)
create table Updates(
update_id int,
record_id int,
comment text,
byUser varchar(25),
datecreate datetime
)
Here's one way.
SELECT * /*But list columns explicitly*/
FROM Orange o
CROSS APPLY (SELECT TOP 1 *
FROM Blue b
WHERE b.datecreate >= '20110901'
AND b.datecreate < '20111001'
AND o.RecordID = b.Record_ID2
ORDER BY b.datecreate DESC) b
Based on the limited information available...
WITH cteLastUpdate AS (
SELECT Record_ID2, UpdateDateTime,
ROW_NUMBER() OVER(PARTITION BY Record_ID2 ORDER BY UpdateDateTime DESC) AS RowNUM
FROM BlueTable
/* Add WHERE clause if needed to restrict date range */
)
SELECT *
FROM cteLastUpdate lu
INNER JOIN OrangeTable o
ON lu.Record_ID2 = o.RecordID
WHERE lu.RowNum = 1
Last updates per record and month:
SELECT *
FROM UPDATES outerUpd
WHERE exists
(
-- Magic part
SELECT 1
FROM UPDATES innerUpd
WHERE innerUpd.RecordId = outerUpd.RecordId
GROUP BY RecordId
, date_part('year', innerUpd.datecolumn)
, date_part('month', innerUpd.datecolumn)
HAVING max(innerUpd.datecolumn) = outerUpd.datecolumn
)
(Works on PostgreSQL, date_part is different in other RDBMS)