SQL group by 1 column but include TOP 1 of other columns - sql

I am trying to build a SQL query where I group by 1 column, but then also include the values of other columns from an arbitrary record in each group. So, something like
SELECT BoxNo
FROM MuffinData
WHERE FrostingTimeApplied >= CONVERT(date, GETDATE())
GROUP BY BoxNo
but including some value from columns MuffinType, FrostingType in the result (I know that there will be only 1 value of MuffinType and FrostingType per box.)

You have to use an aggregate function for each column selected that is not present in the GROUP BY clause:
SELECT BoxNo, MAX(MuffinType) AS MuffinType, MAX(FrostingType) AS FrostingType
FROM MuffinData
WHERE FrostingTimeApplied >= CONVERT(date, GETDATE())
GROUP BY BoxNo
If there is only 1 value of MuffinType and FrostingType per box, then these unique values per box no are going to be selected in the above query.

I know that there will be only 1 value of MuffinType and FrostingType
per box
If that's indeed the case, a simple DISTINCT should do the trick, like so:
SELECT DISTINCT BoxNo, MuffinType, FrostingType
FROM MuffinData
WHERE FrostingTimeApplied >= CONVERT(date, GETDATE());
If that's not the case, you're dealing with a problem known generally as the Top N per group problem. You can find coverage of the problem and suggested solutions here.
Cheers,
Itzik

If you're grouping by anything, then the only way to do this in a single statement (that I'm aware of) is to have the other columns you're returning be the result of an aggregate function. Aggregate functions are anything that take multiple values but return you a single result like: SUM, MAX, MIN, COUNT, etc...
SELECT BoxNo, COUNT(MuffinData.ID), MAX(FrostingType.FlavorID) FROM MuffinData, FrostingType etc...
You might have to adjust your WHERE logic or have another data source in your FROM list (subquery).

You can use a CTE and join back to the original table to get the fields you want. In this case,
WITH BoxGroup AS (SELECT BoxNo FROM MuffinData WHERE FrostingTimeApplied >= CONVERT(date, GETDATE()) GROUP BY BoxNo) SELECT md.BoxNo,md.MuffinType,md.FrostingType FROM MuffinData md INNER JOIN BoxGroup bg ON bg.BoxNo = md.BoxNo

Related

Get the following record in query

If we have a table called Activity and has rows[ActivityCode and StartTime]
for example
ActivityCode-----StartTime<BR>
Lunch------------1200<BR>
MathClass--------1300<BR>
EnglishClass-----1500<BR>
EndOfSchool------1700<BR>
And now I want to make one SQL Query to display as follow:
ActivityCode-----StartTime-----EndTime<BR>
Lunch------------1200----------1300<BR>
MathClass--------1300----------1500<BR>
EnglishClass-----1500----------1700<BR>
EndOfSchool------1700----------1700<BR>
I am not sure how to do it. I tried to follow How to get a value from previous result row of a SELECT statement?. But it didn't work as I expected. Any help is appreciated.
You can use this query:
SELECT
Activity.ActivityCode,
Activity.StartTime,
Nz((Select Top 1 StartTime
From Activity As T
Where T.StartTime > Activity.StartTime
Order By StartTime Asc),
[StartTime]) AS EndTime,
CDate(TimeSerial(Val([EndTime])\100,Val([EndTime]) Mod 100,0)-
TimeSerial(Val([StartTime])\100,Val([StartTime]) Mod 100,0)) AS Duration
FROM
Activity;
Output:
I would use a subquery with aggregation:
select a.*,
(select nz(min(a2.starttime), a.endtime)
from activity as a2
where a2.starttime > a.starttime
) as endtime
from activity as a;
Normally in such an example, there would be an additional column identifying a "grouping" of some sort -- such as a person. If you have such a column, you would have an equality condition in the subquery as well as the inequality on time.
Also, there are much better ways to do this in almost any other database -- notably, the lead() function.

get the latest records

I am currently still on my SQL educational journey and need some help!
The query I have is as below;
SELECT
Audit_Non_Conformance_Records.kf_ID_Client_Reference_Number,
Audit_Non_Conformance_Records.TimeStamp_Creation,
Audit_Non_Conformance_Records.Clause,
Audit_Non_Conformance_Records.NC_type,
Audit_Non_Conformance_Records.NC_Rect_Received,
Audit_Non_Conformance_Records.Audit_Num
FROM Audit_Non_Conformance_Records
I am trying to tweak this to show only the most recent results based on Audit_Non_Conformance_Records.TimeStamp_Creation
I have tried using MAX() but all this does is shows the latest date for all records.
basically the results of the above give me this;
But I only need the result with the date 02/10/2019 as this is the latest result. There may be multiple results however. So for example if 02/10/2019 had never happened I would need all of the idividual recirds from the 14/10/2019 ones.
Does that make any sense at all?
You can filter with a subquery:
SELECT
kf_ID_Client_Reference_Number,
TimeStamp_Creation,
Clause,
NC_type,
NC_Rect_Received,
Audit_Num
FROM Audit_Non_Conformance_Records a
where TimeStamp_Creation = (
select max(TimeStamp_Creation)
from Audit_Non_Conformance_Records
)
This will give you all whose TimeStamp_Creation is equal to the greater value available in the table.
If you want all records that have the greatest day (exluding time), then you can do:
SELECT
kf_ID_Client_Reference_Number,
TimeStamp_Creation,
Clause,
NC_type,
NC_Rect_Received,
Audit_Num
FROM Audit_Non_Conformance_Records a
where cast(TimeStamp_Creation as date) = (
select cast(max(TimeStamp_Creation) as date)
from Audit_Non_Conformance_Records
)
Edit
If you want the latest record per refNumber, then you can correlate the subquery, like so:
SELECT
kf_ID_Client_Reference_Number,
TimeStamp_Creation,
Clause,
NC_type,
NC_Rect_Received,
Audit_Num
FROM Audit_Non_Conformance_Records a
where TimeStamp_Creation = (
select max(TimeStamp_Creation)
from Audit_Non_Conformance_Records a1
where a1.refNumber = a.refNumber
)
For performance, you want an index on (refNumber, TimeStamp_Creation).
If you want the latest date in SQL Server, you can express this as:
SELECT TOP (1) WITH TIES ancr.kf_ID_Client_Reference_Number,
ancr.TimeStamp_Creation,
ancr.Clause,
ancr.NC_type,
ancr.NC_Rect_Received,
ancr.Audit_Num
FROM Audit_Non_Conformance_Records ancr
ORDER BY CONVERT(date, ancr.TimeStamp_Creation) DESC;
SQL Server is pretty good about handling dates with conversions, so I would not be surprised if this used an index on TimeStamp_Creation.

Joining multiple tables returning duplicates

I am trying the following select statement including columns from 4 tables. But the results return each row 4 times, im sure this is because i have multiple left joins but i have tried other joins and cannot get the desired result.
select table1.empid,table2.name,table2.datefrom, table2.UserDefNumber1, table3.UserDefNumber1, table4.UserDefChar6
from table1
inner join table2
on table2.empid=table1.empid
inner join table3
on table3.empid=table1.empid
inner join table4
on table4.empid=table1.empid
where MONTH(table2.datefrom) = Month (Getdate())
I need this to return the data without any duplicates so only 1 row for each entry.
I would also like the "where Month" clause at the end look at the previous month not the current month but struggling with that also.
I am a bit new to this so i hope it makes sense.
Thanks
If the duplicate rows are identical on each column you can use the DISTINCT keyword to eliminate those duplicates.
But I think you should reconsider your JOIN or WHERE clause, because there has to be a reason for those duplicates:
The WHERE clause hits several rows in table2 having the same month on a single empid
There are several rows with the same empid in one of the other tables
both of the above is true
You may want to rule those duplicate rows out by conditions in WHERE/JOIN instead of the DISTINCT keyword as there may be unexpected behaviour when some data is changing in a single row of the original resultset. Then you start having duplicate empids again.
You can check if a date is in the previous month by following clause:
date BETWEEN dateadd(mm, -1, datefromparts(year(getdate()), month(getdate()), 1))
AND datefromparts(year(getdate()), month(getdate()), 1)
This statment uses DATEFROMPARTS to create the beginning of the current month twice, subtract a month from the first one by using DATEADD (results in the beginning of the previous month) and checks if date is between those dates using BETWEEN.
If your query is returning duplicates, then one or more of the tables have duplicate empid values. This is a data problem. You can find them with queries like this:
select empid, count(*)
from table1
group by empid
having count(*) > 1;
You should really fix the data and query so it returns what you want. You can do a bandage solution with select distinct, but I would not usually recommend that. Something is causing the duplicates, and if you do not understand why, then the query may not be returning the results you expect.
As for your where clause. Given your logic, the proper way to express this would include the year:
where year(table2.datefrom) = year(getdate()) and
month(table2.datefrom) = month(Getdate())
Although there are other ways to express this logic that are more compatible with indexes, you can continue down this course with:
where year(table2.datefrom) * 12 + month(table2.datefrom) = year(getdate()) * 12 + Month(Getdate()) - 1
That is, convert the months to a number of months since time zero and then use month arithmetic.
If you care about indexes, then your current where clause would look like:
where table2.datefrom >= dateadd(day,
- (day(getdate) - 1),
cast(getdate() as date) and
table2.datefrom < dateadd(day,
- (dateadd(month, 1, getdate()) - 1),
cast(dateadd(month, 1, getdate()) as date)
Eliminate duplicates from your query by including the distinct keyword immediately after select
Comparing against a previous month is slightly more complicated. It depends what you mean:
If the report was run on the 23rd Jan 2015, would you want 01/12/2014-31/12/2014 or 23/12/2014-22/01/2015?

Separate rows based on a column that has min value

I have a table that has attendance of employee. This table has two columns:
first is the personnel number
second is the time of arrival
I want to isolate the earliest time in this table, because an employee can register multiple times.
Indeed I want to gain Least time of arrivalTime field for each personelNumber
I wrote the following code but it's wrong and can't separate rows
SELECT tal.PersonNo, min(tal.AttendanceTime)
FROM mqa.T_AttendanceLog tal
GROUP BY tal.PersonNo, tal.AttendanceTime
You're almost there. Just remove the AttendanceTime from the group by.
SELECT tal.PersonNo, min(tal.AttendanceTime)
FROM mqa.T_AttendanceLog tal
GROUP BY tal.PersonNo;
If you want the entire row (incase you have other columns) you can use something like this:
select *
from mqa.T_AttendanceLog a
where (PersonNo, AttendanceTime) in(
select b.PersonNo, min(b.AttendanceTime)
from mqa.T_AttendanceLog b
group by b.PersonNo);
Modify your group by clause
SELECT tal.PersonNo,min(tal.AttendanceTime)
FROM mqa.T_AttendanceLog tal
GROUP BY tal.PersonNo
I think you will need the minimum of AttendanceTime for each day. Try this:
SELECT tal.PersonNo,min(CAST(tal.AttendanceTime AS Time))
FROM mqa.T_AttendanceLog tal
GROUP BY tal.PersonNo,CAST(tal.AttendanceTime AS Date)

Group by in t-sql not displaying single result

See the image below. I have a table, tbl_AccountTransaction in which I have 10 rows. The lower most table having columsn AccountTransactionId, AgreementId an so on. Now what i want is to get a single row, that is sum of all amount of the agreement id. Say here I have agreement id =23 but when I ran my query its giving me two rows instead of single column, since there is nano or microsecond difference in between the time of insertion.
So i need a way that will give me row 1550 | 23 | 2011-03-21
Update
I have update my query to this
SELECT Sum(Amount) as Amount,AgreementID, StatementDate
FROM tbl_AccountTranscation
Where TranscationDate is null
GROUP BY AgreementID,Convert(date,StatementDate,101)
but still getting the same error
Msg 8120, Level 16, State 1, Line 1
Column 'tbl_AccountTranscation.StatementDate' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Your group by clause is in error
group by agreementid, convert(date,statementdate,101)
This makes it group by the date (without time) of the statementdate column. Whereas the original is grouping by the statementdate (including time) then for each row of the output, applying the stripping of time information.
To be clear, you weren't supposed to change the SELECT clause
SELECT Sum(Amount) as Amount,AgreementID, Convert(date,StatementDate,101)
FROM tbl_AccountTranscation
Where TranscationDate is null
GROUP BY AgreementID,Convert(date,StatementDate,101)
Because you have a Group By StatementDate.
In your example you have 2 StatementDates:
2011-03-21 14:38:59.470
2011-03-21 14:38:59.487
Change your query in the Group by section instead of StatementDate to be:
Convert(Date, StatementDate, 101)
Have you tried to
Group by (Convert(date,...)
instead of the StatementDate
You are close. You need to combine your two approaches. This should do it:
SELECT Sum(Amount) as Amount,AgreementID, Convert(date,StatementDate,101)
FROM tbl_AccountTranscation
Where TranscationDate is null
GROUP BY AgreementID,Convert(date,StatementDate,101)
If you never need the time, the perhaps you need to change the datatype, so you don't have to do alot of unnecessary converting in most queries. SQL Server 2008 has a date datatype that doesn't include the time. In earlier versions you could add an additional date column that is automatically generated to strip out the time companent so all the dates are like the format of '2011-01-01 00:00:00:000' then you can do date comparisons directly having only had to do the conversion once. This would allow you to have both the actual datetime and just the date.
You should group by DATEPART(..., StatementDate)
Ref: http://msdn.microsoft.com/en-us/library/ms174420.aspx