SQL - sum column for every date - sql

This seemed like a very easy thing to do but I got stuck. I have a query like this:
select op.date, count(p.numberofoutstanding)
from people p
left join outstandingpunches op
on p.fullname = op.fullname
group by op.date
That outputs a table like this:
How can I sum over the dates so the sum for each row is equal to the sums up to that date? For example, the first column would be 27, the second would be 27 + 4, the third 27 + 4 + 11, etc.
I encountered this and this question, and I saw people are using OVER in their queries for this, but I'm confused by what do I have to partition. I tried partitioning by date but it's giving me incorrect results.

You can use a cumulative sum. This looks like:
select op.date, count(*),
sum(count(*)) over (order by op.date) as running_count
from people p join
outstandingpunches op
on p.fullname = op.fullname
group by op.date;
Note: I changed the join from a left join to an inner join. You are aggregating by a column in the second table. Your results have no examples of a NULL date column and that doesn't seem useful. Hence, it seems that rows are assumed to match.

I believe you need to use sum and not count.
select o.date_c,
sum(sum(p.numberofoutstanding)) over (order by o.date_c)
from people p
left join outstandingpunches o on p.fullname = o.fullname
group by o.date_c;
Here is a small demo:
DEMO
Have in mind that I have renamed your column date to date_c. I believe you should not use data types as column names.

Related

Find the Max Value in a list of rows and return the whole row SQL Server

Hi I am trying to figure out how to return one row from a table that has the highest value in a column. Before I can do that I have to take a substring of that column and convert it to an INT.
The Main table name is "CalendarPeriod"
Here is an example of the column
FiscalPeriod - 0120
I have to take the two characters from the left, which will be the number I am trying to find the MAX value of
Below is a list of the rows with there column Names. FiscalPeriod is the Column that I need to Substring and Cast to an Int to find the max value.
The result of this query should return return the row with the FiscalPeriod "1220" with the Id being "1134", but it is returning the first one.
Here is the query that I found, but I am having trouble trying to figure out how to convert that back to the NVARCHAR. I though that If I added the regular Fiscal Period inside the inner join I would be able to use that in the Where Clause to get the max value, but I was wrong.
I also am Inner Joining another Table "CalendarPeriodHeader" which has the year which I will also need for a comparison in the where. This might be messing up the query results as well, but I am stumped Below is the query I have so far.
Select Top 1 cp.*
From CalendarPeriod as cp
INNER JOIN
(
Select Top 1 Id, Max(Cast(Substring(FiscalPeriod, 1,2) as Int)) as MAXFiscal, FiscalPeriod
From CalendarPeriod
Group By Id, FiscalPeriod
)cp2
On cp2.Id = cp.Id
Inner Join CalendarPeriodHeader as header on header.Id = cp.HeaderId
Where cp2.FiscalPeriod = cp.FiscalPeriod And header.Year = 2020
I would use this query:
SELECT TOP 1 * FROM dbo.CalendarPeriod ORDER BY Cast(Substring(FiscalPeriod, 1,2) as Int) DESC

SQL Work out average from joined column

I have 3 columns I need to display and I need to join on another column that calculates the AVG from the CLUB_FEE column. My code does not work, it throws a "not a single-group group function" Can someone please help? Here is my SQL:
SELECT S.MEMBER_ID, S.CLUB_ID, C.CLUB_FEE, AVG(C.CLUB_FEE) AVGINCOME
FROM SUBSCRIPTION S, CLUB C
WHERE S.CLUB_ID = C.CLUB_ID;
i Suggest to use Inner join try it also When you include an aggregate function (like avg, sum) in your query, you must group by all columns :
SELECT S.MEMBER_ID, S.CLUB_ID, C.CLUB_FEE, AVG(C.CLUB_FEE) as AVGINCOME
FROM SUBSCRIPTION S INNER JOIN CLUB C
ON S.CLUB_ID = C.CLUB_ID
GROUP BY
S.MEMBER_ID, S.CLUB_ID, C.CLUB_FEE ;
Learn to use explicit JOIN syntax. Simple rule: Never use commas in the FROM clause. Always use explicit JOIN syntax.
In your case, you need to remove columns from the SELECT and the GROUP BY. If you want the average fee paid by any member, then you don't need the GROUP BY at all:
SELECT AVG(C.CLUB_FEE) as AVGINCOME
FROM SUBSCRIPTION S JOIN
CLUB C
ON S.CLUB_ID = C.CLUB_ID;
If you want to control the formatting, either use to_char():
SELECT TO_CHAR(AVG(C.CLUB_FEE), '999.99') as AVGINCOME
(check the documentation for other formats).
Or, cast to a decimal:
SELECT CAST(AVG(C.CLUB_FEE) AS DECIMAL(10, 2)) as AVGINCOME
If you need to display the three columns and the average, not just the average alone, you can do something like this:
SELECT S.MEMBER_ID, S.CLUB_ID, C.CLUB_FEE, A.AVGINCOME
FROM SUBSCRIPTION S INNER JOIN CLUB C
ON S.CLUB_ID = C.CLUB_ID
CROSS JOIN (SELECT AVG(CLUB_FEE) AS AVGINCOME FROM CLUB) A
;
If you need the average rounded to two decimal places, use ROUND(AVG(CLUB_FEE), 2) in the subquery.
A fancier solution, which doesn't require a join (so it doesn't scan the CLUB table twice), uses AVG as an analytic function - but doesn't partition by anything. You still need the PARTITION BY clause (with an empty column list) to indicate it's used as an analytic function, not as an aggregate.
SELECT S.MEMBER_ID, S.CLUB_ID, C.CLUB_FEE,
ROUND(AVG(C.CLUB_FEE) OVER (PARTITION BY NULL)) AS AVGINCOME
FROM SUBSCRIPTION S INNER JOIN CLUB C
ON S.CLUB_ID = C.CLUB_ID
;
Even fancier (although functionally identical) - the keyword OVER is needed to indicate analytic function, but you can also write it as OVER() (no need to even mention PARTITION BY NULL).

Count items in SQL Server

This is my database
I want to count Bikes which are currently available in a RouteCode in a SAME EXPIRY WEEK. So if EXPIRY WEEKs are different, the RouteCode can reappear, otherwise the RouteCode has to display with the BikeQuantity it has.
This is my problem. The RouteCode = G shows up 2 times with 1 bikes each even though they are expired in a same week. How can I say it has 2 bikes in BikeQuantity column?
The first problem is that you are GROUPing by 'FoundDate'. You need to group by 'ExpiryWeek' if you want to aggregate (i.e. sum) the number of bikes on a per-ExpiryWeek basis.
The second problem is that you are SELECTing 'FoundDate' and 'ExpiryDate'. You cannot select these columns if you're grouping on 'ExpiryWeek' because there's no way to aggregate the data (dates) in those columns*. It follows that you should not order by 'ExpiryDate' (since this column won't appear in the output table).
( * ...because if you have two entries for a given expiry week but with different a 'FoundDate' for each, what would you expect to see in the result table in the 'FoundDate' column for that row?)
Change your SELECT clause to this:
SELECT li.RouteCode,
DATEPART(WK,DATEADD(WEEK, 4, FoundDate)) as ExpiryWeek,
COUNT(b.BikeId) As BikeQuantity
and change your GROUP BY clause to this:
GROUP BY li.RouteCode, DATEPART(WK,DATEADD(WEEK, 4, FoundDate))
ORDER BY DATEPART(WK,DATEADD(WEEK, 4, FoundDate));
and your query should work.
(Because 'ExpiryWeek' is a calculated column, you have to supply the same calculation in the GROUP BY and ORDER BY clauses - just specifying 'ExpiryWeek' won't work, at least for some variants of SQL. There are ways around this: for example, you could use a 'With' clause. See this answer for examples which avoid duplication: How to group by a Calculated Field)
Based on the latest code you have posted, the correct query should be:
select li.RouteCode,
DATEPART(WK,DATEADD(WEEK, 4, b.FoundDate)) as ExpiryWeek,
COUNT(b.BikeId) as BikeQuantity
FROM dbo.Bike b
LEFT OUTER JOIN dbo.Contact ct ON b.ContactId = ct.ContactId
INNER JOIN dbo.LocationInfo li ON li.PostCode = ct.PostCode
WHERE DATEADD(day, 30, FoundDate) >= GETDATE()
Group by li.RouteCode, DATEPART(WK,DATEADD(WEEK, 4, FoundDate))
ORDER BY DATEPART(WK,DATEADD(WEEK, 4, FoundDate));

left join not doing as expected with sum and group by

This is all going to have to be pseudo as I am on my phone and have no internet access right now as I have just moved but its bugging the crap out of me. This also means I can't do code blocks please bear with me: I'll try.
I have a table with amounts in it, and I have a table with labels. I want to sum the amounts in the first table grouped by the labels. The problem is, if there are no records for a label existing in the table with the amounts then I don't get a record in the result set for that label. I need a record there with nulls for the amount tables field. Here is what some sample data might look like:
Amount_table:
Columns: id, tpa, amt, link_to_label_table
Data:
1, GTL, 2000, 1
2, GTL, 1000, 1
Label_table:
Columns: link_to_amount_table, label_name
Data:
1, Label1
2, Label2
Query:
Select at.tpa, sum(at.amt) as amt, lt.label_name
From Amount_table as at
Left join Label_tabl lt on lt.link_to_amount_table = at.link_to_label_table
Where at.tpa = 'GTL'
Group by lt.label, at.tpa
Now this returns:
GTL, 3000, Label1
I tried selecting from the labels table then left joining the amount table and it still didn't give my desired results which are:
GTL, 3000, Label1
Null, Null, Label2
Is this possible with the sum and group by? The fields being grouped by have to be there otherwise you get an error.
This is in DB2 by the way. Is there any way possible to get this to return the way I need it? I have to get the labels; they are dynamic.
On the face of it, you want to have your labels table as the dominant table and the amounts table as the one that is outer joined.
SELECT a.tpa, sum(a.amt) as amt, l.label_name
FROM Label_table AS l
LEFT JOIN Amount_table AS a
ON l.link_to_amount_table = a.link_to_label_table
GROUP BY l.label, a.tpa
You have a condition Amount_table.tpa = 'GTL'; it is not entirely clear why you have that, but presumably it is significant with more data in the tables. There are (at least) two ways you can incorporate that condition into the query (other than the one you chose - which eliminates the rows where a.tpa is null).
SELECT a.tpa, sum(a.amt) as amt, l.label_name
FROM Label_table AS l
LEFT JOIN Amount_table AS a
ON l.link_to_amount_table = a.link_to_label_table
AND a.tpa = 'GTL'
GROUP BY l.label, a.tpa
Or:
SELECT a.tpa, sum(a.amt) as amt, l.label_name
FROM Label_table AS l
LEFT JOIN (SELECT *
FROM Amount_table
WHERE tpa = 'GTL') AS a
ON l.link_to_amount_table = a.link_to_label_table
GROUP BY l.label, a.tpa
A decent optimizer will produce the same query plan for both, so it probably doesn't matter which you use. There's an argument that suggests the second alternative is cleaner in that the ON clause is primarily for joining conditions, and the filter condition on a.tpa is not a joining condition. There's another argument that says the first alternative avoids a sub-query and is therefore preferable. I'd validate that the query plans are the same and would probably choose the second, but it is a somewhat nebulous decision based on a mild preference.
You were so close on your second try. Change WHERE to AND. This has the effect of applying at.tpa='GTL' to the JOIN instead of applying it to the filter so you don't filter out the NULLs.

Aggregate SQL Function to grab only the first from each group

I have 2 tables - an Account table and a Users table. Each account can have multiple users. I have a scenario where I want to execute a single query/join against these two tables, but I want all the Account data (Account.*) and only the first set of user data (specifically their name).
Instead of doing a "min" or "max" on my aggregated group, I wanted to do a "first". But, apparently, there is no "First" aggregate function in TSQL.
Any suggestions on how to go about getting this query? Obviously, it is easy to get the cartesian product of Account x Users:
SELECT User.Name, Account.* FROM Account, User
WHERE Account.ID = User.Account_ID
But how might I got about only getting the first user from the product based on the order of their User.ID ?
Rather than grouping, go about it like this...
select
*
from account a
join (
select
account_id,
row_number() over (order by account_id, id) -
rank() over (order by account_id) as row_num from user
) first on first.account_id = a.id and first.row_num = 0
I know my answer is a bit late, but that might help others. There is a way to achieve a First() and Last() in SQL Server, and here it is :
Stuff(Min(Convert(Varchar, DATE_FIELD, 126) + Convert(Varchar, DESIRED_FIELD)), 1, 23, '')
Use Min() for First() and Max() for Last(). The DATE_FIELD should be the date that determines if it is the first or last record. The DESIRED_FIELD is the field you want the first or the last value. What it does is :
Add the date in ISO format at the start of the string (23 characters long)
Append the DESIRED_FIELD to that string
Get the MIN/MAX value for that field (since it start with the date, you will get the first or last record)
Stuff that concatened string to remove the first 23 characters (the date part)
Here you go!
EDIT: I got problems with the first formula : when the DATE_FIELD has .000 as milliseconds, SQL Server returns the date as string with NO milliseconds at all, thus removing the first 4 characters from the DESIRED_FIELD. I simply changed the format to "20" (without milliseconds) and it works all great. The only downside is if you have two fields that were created at the same seconds, the sort can possibly be messy... in which cas you can revert to "126" for the format.
Stuff(Max(Convert(Varchar, DATE_FIELD, 20) + Convert(Varchar, DESIRED_FIELD)), 1, 19, '')
EDIT 2 : My original intent was to return the last (or first) NON NULL row. I got asked how to return the last or first row, wether it be null or not. Simply add a ISNULL to the DESIRED_FIELD. When you concatenate two strings with a + operator, when one of them is NULL, the result is NULL. So use the following :
Stuff(Max(Convert(Varchar, DATE_FIELD, 20) + IsNull(Convert(Varchar, DESIRED_FIELD), '')), 1, 19, '')
Select *
From Accounts a
Left Join (
Select u.*,
row_number() over (Partition By u.AccountKey Order By u.UserKey) as Ranking
From Users u
) as UsersRanked
on UsersRanked.AccountKey = a.AccountKey and UsersRanked.Ranking = 1
This can be simplified by using the Partition By clause. In the above, if an account has three users, then the subquery numbers them 1,2, and 3, and for a different AccountKey, it will reset the numnbering. This means for each unique AccountKey, there will always be a 1, and potentially 2,3,4, etc.
Thus you filter on Ranking=1 to grab the first from each group.
This will give you one row per account, and if there is at least one user for that account, then it will give you the user with the lowest key(because I use a left join, you will always get an account listing even if no user exists). Replace Order By u.UserKey with another field if you prefer that the first user be chosen alphabetically or some other criteria.
I've benchmarked all the methods, the simpelest and fastest method to achieve this is by using outer/cross apply
SELECT u.Name, Account.* FROM Account
OUTER APPLY (SELECT TOP 1 * FROM User WHERE Account.ID = Account_ID ) as u
CROSS APPLY works just like INNER JOIN and fetches the rows where both tables are related, while OUTER APPLY works like LEFT OUTER JOIN and fetches all rows from the left table (Account here)
You can use OUTER APPLY, see documentation.
SELECT User1.Name, Account.* FROM Account
OUTER APPLY
(SELECT TOP 1 Name
FROM [User]
WHERE Account.ID = [User].Account_ID
ORDER BY Name ASC) User1
SELECT (SELECT TOP 1 Name
FROM User
WHERE Account_ID = a.AccountID
ORDER BY UserID) [Name],
a.*
FROM Account a
The STUFF response from Dominic Goulet is slick. But, if your DATE_FIELD is SMALLDATETIME (instead of DATETIME), then the ISO 8601 length will be 19 instead of 23 (because SMALLDATETIME has no milliseconds) - so adjust the STUFF parameter accordingly or the return value from the STUFF function will be incorrect (missing the first four characters).
First and Last do not exist in Sql Server 2005 or 2008, but in Sql Server 2012 there is a First_Value, Last_Value function. I tried to implement the aggregate First and Last for Sql Server 2005 and came to the obstacle that sql server does guarantee the calculation of the aggregate in a defined order. (See attribute SqlUserDefinedAggregateAttribute.IsInvariantToOrder Property, which is not implemented.) This might be because the query analyser tries to execute the calculation of the aggregate on multiple threads and combine the results, which speeds up the execution, but does not guarantee an order in which elements are aggregated.
Define "First". What you think of as first is a coincidence that normally has to do with clustered index order but should not be relied on (you can contrive examples that break it).
You are right not to use MAX() or MIN(). While tempting, consider the scenario where you the first name and last name are in separate fields. You might get names from different records.
Since it sounds like all your really care is that you get exactly one arbitrary record for each group, what you can do is just MIN or MAX an ID field for that record, and then join the table into the query on that ID.
There are a number of ways of doing this, here a a quick and dirty one.
Select (SELECT TOP 1 U.Name FROM Users U WHERE U.Account_ID = A.ID) AS "Name,
A.*
FROM Account A
(Slightly Off-Topic, but) I often run aggregate queries to list exception summaries, and then I want to know WHY a customer is in the results, so use MIN and MAX to give 2 semi-random samples that I can look at in details e.g.
SELECT Customer.Id, COUNT(*) AS ProblemCount
, MIN(Invoice.Id) AS MinInv, MAX(Invoice.Id) AS MaxInv
FROM Customer
INNER JOIN Invoice on Invoice.CustomerId = Customer.Id
WHERE Invoice.SomethingHasGoneWrong=1
GROUP BY Customer.Id
Create and join with a subselect 'FirstUser' that returns the first user for each account
SELECT User.Name, Account.*
FROM Account, User,
(select min(user.id) id,account_id from User group by user.account_id) as firstUser
WHERE Account.ID = User.Account_ID
and User.id = firstUser.id and Account.ID = firstUser.account_id