SQL Server Group By Query Select first row each group - sql

I am trying to do this query. This is what I have.
My table is: Table
StudyID FacultyID Year Access1 Access2 Access3
1 1 2014 4 8 5
1 2 2014 8 4 7
1 1 2013 5 4 4
2 3 2014 4 6 5
2 5 2013 5 8 10
2 4 2014 5 5 7
3 7 2013 9 4 7
I want to group by StudyID and Year and get the minimum value of each field Access1 Access2 and Access3 and show only the last year, I mean for each group the first row.
Here is the Result.
StudyID Year Access1 Access2 Access3
1 2014 4 4 5
2 2014 4 5 5
3 2013 9 4 7
This is my Query:
SELECT DISTINCT T.StudyID, T.Year, MIN(T.Access1), MIN(T.Access2), MIN(T.Access3)
FROM T
GROUP BY T.StudyID, T.Year
ORDER BY T.StudyID, T.Year DESC
I also tried with this one.
;WITH MyQuery AS ( SELECT DISTINCT T.StudyID, T.Year, MIN(T.Access1), MIN(T.Access2), MIN(T.Access3),ROW_NUMBER() OVER (PARTITION BY T.StudyID, T.Year ORDER BY T.StudyID, T.Year DESC) AS rownumber
FROM T GROUP BY T.StudyID, T.Year ORDER BY T.StudyID , T.Year DESC ) SELECT * FROM MyQuery WHERE rownumber = 1
Any success, I know I am missing something...but dont know what?
Thanks in advance!!!!

You can GROUP BY StudyID, Year and then in an outer query select the first row from each StudyID, Year group:
SELECT StudyID, Year, minAccess1, minAccess2, minAccess3
FROM (
SELECT StudyID, Year, min(Access1) minAccess1, min(Access2) minAccess2,
min(Access3) minAccess3,
ROW_NUMBER() OVER (PARTITION BY StudyID ORDER BY Year DESC) AS rn
FROM mytable
GROUP BY StudyID, Year ) t
WHERE t.rn = 1
ROW_NUMBER is used to assign an ordering number to each StudyID group according to Year values. The row with the maximum Year value is assigned a rn = 1.

Try this:
SELECT DISTINCT T.StudyID, T.Year, MIN(T.Access1), MIN(T.Access2), MIN(T.Access3)
FROM myTable T
WHERE T.Year = (SELECT MAX(T2.Year) FROM myTable T2 WHERE T2.StudyID = T.StudyID)
GROUP BY T.StudyID
Its giving the result you wanted in SQLite, but perhaps in SQL-Server needs some alias I'm not sure. Can't test it right now.

This is giving the answer you want
SELECT DISTINCT T.StudyID, T.Year, MIN(T.Access1) as Access1, MIN(T.Access2) as Access2, MIN(T.Access3) as Access3
FROM T T
WHERE T.Year = (SELECT MAX(T2.Year) FROM T T2 WHERE StudyID = T.StudyID)
GROUP BY T.StudyID, T.Year
Order by 1

Related

SUM and MAX function in SQL with multiple group by clause causes issue

I have the following table:
id
student
period
point
1
1
Q1
0
2
2
Q1
2
3
2
Q2
5
4
2
Q3
0
5
3
Q1
7
6
3
Q1
8
7
3
Q2
3
8
3
Q2
1
9
3
Q3
0
10
3
Q3
0
11
4
Q1
1
12
4
Q3
9
I want to know that in which period which student has the most points in total.
When I execute this query:
SELECT
MAX(SUM(point)) score,
student,
`period`
FROM table1
GROUP BY student, `period`
it gives the following error:
#1111 - Invalid use of group function
When I execute this query:
SELECT
`period`,
student,
MAX(p) score
FROM
(
SELECT
SUM(point) p,
student,
`period`
FROM table1
GROUP BY student, `period`
) t1
GROUP BY `period`
it gives the following result:
period
student
score
Q1
1
15
Q2
1
5
Q3
1
9
The periods and their max points are good, but I always have the first student id.
Expected output:
period
student
score
Q1
3
15
Q2
2
5
Q3
4
9
On top of that. If there is more than one student with the highest points, I want to know all of them.
You could use max window function as the following:
WITH sum_pt AS
(
SELECT student, period,
SUM(point) AS st_period_pt
FROM table1
GROUP BY student, period
),
max_sum as
(
SELECT *,
MAX(st_period_pt) OVER (PARTITION BY period) AS max_pt_sum
FROM sum_pt
)
SELECT student, period, st_period_pt
FROM max_sum
WHERE st_period_pt = max_pt_sum
ORDER BY period
See demo.
Try with window functions:
SUM, to get the total points for each <student, period> pair
ROW_NUMBER, to rank points for each period
Then you can select where ranking = 1 to get your highest points for each period.
WITH students_with_total_points AS (
SELECT *, SUM(point) OVER(PARTITION BY student, period) AS total_points
FROM tab
), ranking_on_periods AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY period ORDER BY total_points DESC) AS rn
FROM students_with_total_points
)
SELECT id, student, period, total_points
FROM ranking_on_period
WHERE rn = 1
You could use left join as follows :
select t1.period, t1.student, t1.score
from (
select student, period, score
from (
select student, period, SUM(point) as score
from table1 s
group by student, period
) as s
group by period, student
) as t1
left join (
select student, period, score
from (
select student, period, SUM(point) as score
from table1 s
group by student, period
) as s
group by period, student
) as t2 on t1.student = t2.student and t1.score < t2.score
where t2.score is null;
This query will list also students and their periods if there scores is 0, you can excludes them by adding where close in t1 and t2 temp tables.

Query to restrict results from left join

I have the following query
select S.id, X.id, 15,15,1 from schema_1.tbl_2638 S
JOIN schema_1.tbl_2634_customid X on S.field_1=x.fullname
That returns the following results, where you can see the first column is duplicated on matches to the 2nd table.
1 1 15 15 1
2 3 15 15 1
2 2 15 15 1
3 5 15 15 1
3 4 15 15 1
I'm trying to get a query that would just give me a single row per 1st ID, and the min value from 2nd ID. So I want a result that would be:
1 1 15 15 1
2 2 15 15 1
3 4 15 15 1
I'm a little rust on my SQL skills, how would I write the query to provide the above result?
From your result you can do,this to achieve your result, for much more compicated structures, you can always take a look at window fucntions
select S.id, MIN(X.id) x_id, 15,15,1 from schema_1.tbl_2638 S
JOIN schema_1.tbl_2634_customid X on S.field_1=x.fullname
GROUP BY 1,3,4,5
window function can be used, need always a outer SELECT
SELECT
s_id,x_idm a,b,c
FROM
(select S.id as s_id, X.id as x_id, 15 a ,15 b,1 c
, ROW_NUMBER() OVER (PARTITION BY S.id ORDER BY X.id ASC) rn
from schema_1.tbl_2638 S
JOIN schema_1.tbl_2634_customid X on S.field_1=x.fullname)
WHERE rn = 1
Or as CTE
WITH CTE as (select S.id as s_id, X.id as x_id, 15 a ,15 b,1 c
, ROW_NUMBER() OVER (PARTITION BY S.id ORDER BY X.id ASC) rn
from schema_1.tbl_2638 S
JOIN schema_1.tbl_2634_customid X on S.field_1=x.fullname)
SELECT s_id,x_id,a,b,c FROM CTE WHERE rn = 1

How to select top 2 values for each id

I have a table with values
id sales date
1 5 "2015-01-04"
1 3 "2015-01-03"
1 1 "2015-01-01"
1 1 "2015-01-01"
2 7 "2015-01-05"
2 6 "2015-01-04"
2 4 "2015-01-03"
3 11 "2015-01-08"
3 10 "2015-01-07"
3 9 "2015-01-06"
3 8 "2015-01-05"
I want to select top two values of each id as shown in desired output.
Desired output:
id sales date
1 5 "2015-01-04"
1 3 "2015-01-03"
2 7 "2015-01-05"
2 6 "2015-01-04"
3 11 "2015-01-08"
3 10 "2015-01-07"
My attempt:
can someone help me with this. Thank you in advance!
select transactions.salesperson_id, transactions.id, transactions.date
from transactions
ORDER BY transactions.salesperson_id ASC, transactions.date DESC;
This can be done using window functions:
select id, sales, "date"
from (
select id, sales, "date",
dense_rank() over (partition by id order by "date" desc) as rnk
from transactions
) t
where rnk <= 2;
If there are multiple rows on the same date this might return more than two rows for the same ID. If you don't want that, use row_number() instead of dense_rank()
row_number() will get what you want.
select * from
(select row_number() over (partition by id order by date) as rn, sales, date from transactions) t1
where t1.rn <= 2

pick all positive least numbers from data set [duplicate]

This question already has answers here:
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
Oracle group using min date
(3 answers)
GROUP BY with MAX(DATE) [duplicate]
(6 answers)
Closed 3 years ago.
I have below data in a table
ID AMOUNT DAYS
1 10 1
1 20 2
1 30 3
1 1 4
2 34 1
2 234 2
2 234 3
2 34 4
3 3 1
3 3 2
3 23 3
3 20 4
I want below results as all amounts which have least days of a ID
ID AMOUNT DAYS
1 10 1
2 34 1
3 3 1
Please suggest a sql query to pick this desired output
For your example, you can simply do:
select t.*
from t
where t.days = 1;
If 1 is not fixed, then a correlated subquery is one method:
select t.*
from t
where t.days = (select min(t2.days) from t t2 where t2.id = t.id);
Another method is aggregation:
select t.id, min(t.days) as min_days,
min(t.amount) keep (dense_rank first order by t.days asc) as min_amount
from t
group by t.id;
Of course row_number()/rank() is another alternative.
With an index on (id, days) and a large table, one of the above methods may be faster in practice.
You can use rank() function
select ID, Amount, Days from
(
select rank() over (partition by ID order by days) as rn,
t.*
from tab t
)
where rn = 1;
Demo
First group by id to find the min days for each id and then join to the table
select t.*
from tablename t inner join (
select id, min(days) days
from tablename
group by id
) g on g.id = t.id and g.days = t.days

Sql query to Count Total Consecutive Years from latest year

I have a table Temp:
CREATE TABLE Temp
(
[ID] [int],
[Year] [INT],
)
**ID Year**
1 2016
1 2016
1 2015
1 2012
1 2011
1 2010
2 2016
2 2015
2 2014
2 2012
2 2011
2 2010
2 2009
3 2016
3 2015
3 2004
3 1999
4 2016
4 2015
4 2014
4 2010
5 2016
5 2014
5 2013
I want to calculate the total consecutive years starting from the most recent Year.
Result should look like this:
ID Total Consecutive Yrs
1 2
2 3
3 2
4 3
5 1
select ID,
-- returns a sequence without gaps for consecutive years
first_value(year) over (partition by ID order by year desc) - year +1 as x,
-- returns a sequence without gaps
row_number() over (partition by ID order by year desc) as rn
from Temp
e.g. for ID=1:
1 2016 1 1
1 2015 2 2
1 2012 5 3
1 2011 6 4
1 2010 7 5
As long as there's no gap, both sequences increase the same.
Now check for equal sequences and count the rows:
with cte as
(
select ID,
-- returns a sequence without gaps for consecutive years
first_value(year) over (partition by ID order by year desc) - year + 1 as x,
-- returns a sequence without gaps
row_number() over (partition by ID order by year desc) as rn
from Temp
)
select ID, count(*)
from cte
where x = rn -- no gap
group by ID
Edit:
Based on your year zero comment:
with cte as
(
select ID, year,
-- returns a sequence without gaps for consecutive years
first_value(year) over (partition by ID order by year desc) - year + 1 as x,
-- returns a sequence without gaps
row_number() over (partition by ID order by year desc) as rn
from Temp
)
select ID,
-- remove the year zero from counting
sum(case when year <> 0 then 1 else 0 end)
from cte
where x = rn
group by ID
You can use lead and get this counts as below:
Select top (1) with ties Id, RowN as [Total Consecutive Years] from (
Select *, Num = case when ([year]- lead(year) over(partition by Id order by [Year] desc) > 1) then 0 else 1 end
, RowN = Row_Number() over (partition by Id order by [Year] desc)
from temp
) a
where a.Num = 0
order by row_number() over(partition by Id order by RowN)
Output as below:
+----+-------------------------+
| Id | Total Consecutive Years |
+----+-------------------------+
| 1 | 2 |
| 2 | 3 |
| 3 | 2 |
| 4 | 3 |
| 5 | 1 |
+----+-------------------------+
You can do this using window functions:
select id, count(distinct year)
from (select t.*,
dense_rank() over (partition by id order by year + seqnum desc) as grp
from (select t.*,
dense_rank() over (partition by id order by year desc) as seqnum
from temp t
) t
) t
where grp = 1
group by id;
This assumes that "most recent year" is per id.
Gordon Linoff,
Your code is awesome!
Your code pulls consecutive years from the most recent year.
I modified it to pull overall max consecutive years.
Posted here in case anyone else needs it:
--overall max consecutive years
select id,max(yr_cnt) max_consecutive_years
from (
select id, grp,count(seqnum) yr_cnt
from (select t.*,
dense_rank() over (partition by id order by year + seqnum desc) as grp
from (select t.*,
dense_rank() over (partition by id order by year desc) as seqnum
from temp t
) t
) t
group by id,grp) t2
group by id;