Displaying value in only last row of a partition - sql

I am generating a transcript for students in a database. I am using window functions to provide running semester and cumulative GPAs. My query is as follows:
SELECT FIRSTNAME, COURSENAME, SCORE, MAX_SCORE, SEMESTER, SGPA, AVG(ROUND(SGPA,2)) OVER (PARTITION BY FIRSTNAME ORDER BY SEMESTER) AS CGPA
FROM(
SELECT FIRSTNAME, COURSENAME, SCORE, MAX_SCORE, SEMESTER, ROUND((SEM_TOTAL_SCORE * 4/SEM_TOTAL_MAX_SCORE ),2) AS SGPA FROM(
SELECT FIRSTNAME, COURSENAME, SCORE, MAX_SCORE, SEMESTER,
SUM(SCORE)
OVER (PARTITION BY FIRSTNAME ORDER BY SEMESTER) AS SEM_TOTAL_SCORE,
SUM(MAX_SCORE)
OVER (PARTITION BY FIRSTNAME ORDER BY SEMESTER) AS SEM_TOTAL_MAX_SCORE
FROM STUDENT_GRADE_COURSE
)
);
I get the following results:
Now the results are correct. But I do not want to display the SGPA and CGPA on each row. Rather I want them to be displayed only on the last row of the partition which in this case is the semester. So on last row of semester 1 and semester 2 I see the gpas...on the other rows nothing should be displayed in those columns.
How can I do that?

Related

Is this SQL query possible? I am trying to get the least frequent names in this table

This is a toy public table in google BigQuery:
The table contains the names given to people in the US at birth and the frequency of those names for each state and year from 1910 to 2020
Columns: name, year, state, number, gender
names toy table
I am trying to get the LEAST popular names (names with lowest 'number' column) each year.
I am not sure this is possible with this schema.
Depending on how you want to handle a tie, you want rank or row_number.
select
*
from (
select
*,
row_number() over (partition by year order by name_frequency) as rn
from (
select
year,
name,
sum(number) as name_frequency
from `bigquery-public-data.usa_names.usa_1910_2013`
group by
year,
name
) sub1
) sub2
where rn = 1
Consider below approach
select * from (
select year, gender, name, sum(number) number
from `bigquery-public-data.usa_names.usa_1910_2013`
group by year, gender, name
qualify 1 = row_number() over(partition by year, gender order by number)
)
pivot (any_value(name) name, any_value(number) number for gender in ('M', 'F'))
# order by year desc
with output (just top 9 shown)
In reality - there are many names that have same least frequency - to get all of them - use below approach
select * from (
select year, gender, name, sum(number) number
from `bigquery-public-data.usa_names.usa_1910_2013`
group by year, gender, name
qualify 1 = dense_rank() over(partition by year, gender order by number)
)
pivot (string_agg(name) name, any_value(number) number for gender in ('M', 'F'))
with output
while if you would look for most frequent - you would use below
select * from (
select year, gender, name, sum(number) number
from `bigquery-public-data.usa_names.usa_1910_2013`
group by year, gender, name
qualify 1 = dense_rank() over(partition by year, gender order by number desc)
)
pivot (string_agg(name) name, any_value(number) number for gender in ('M', 'F'))
with just one most frequent name per year/gender

Merge array with previous value (under defined conditions)

Here's the initial table's structure :
yearquarter,user_id,gender,generation,country,group_id
2019-03,zfuzhfuzh,M,Y,FR,Group_1
2019-04,zfuzhfuzh,M,Y,FR,Group_1
2020-04,zfuzhfuzh,M,Y,FR,Group_1
2019-03,ggezegz,F,Y,FR,Group_2
2019-04,ggezegz,F,Y,FR,Group_2
2020-04,ggezegz,F,X,FR,Group_2
....
I want to be able to know the cumulative amount of user_id quarter after quarter grouped by gender, generation and country. Expected result: for a given combination of gender,generation,country I need the cumulated number of users quarter after quarter.
I started with this :
SELECT yearquarter,gender,generation,country,array_agg(distinct user_id IGNORE NULLS) as users FROM my table
WHERE group_id= "mygroup"
GROUP BY 1,2,3,4
But I don't know how to go from this to the result I'm looking for...
You can use aggregation to count the number of users per gender, generation country and period, and then make a window sum over the periods;
select
gender,
generation,
country,
yearquarter,
sum(count(distinct user_id)) over(partition by gender, generation, country order by yearquarter) cnt
from mytable
where group_id = 'mygroup'
group by gender, generation, country, yearquarter
order by gender, generation, country, yearquarter
I am unsure that bigquery supports distinct in window functions. If it doesn't, then we can use a subquery:
select
gender,
generation,
country,
yearquarter,
sum(count(*)) over(partition by gender, generation, country order by yearquarter) cnt
from (
select distinct gender, generation, country, yearquarter, user_id
from mytable
where group_id = 'mygroup'
) t
group by gender, generation, country, yearquarter
order by gender, generation, country, yearquarter
If you want each user to be counted only once, for their first appearance period:
select select
gender,
generation,
country,
yearquarter,
sum(count(*)) over(partition by gender, generation, country order by yearquarter) cnt
from (
select gender, generation, country, user_id, min(yearquarter) yearquarter
from mytable
where group_id = 'mygroup'
group by gender, generation, country, user_id
) t
group by gender, generation, country
Below is for BigQuery Standard SQL - built purely on top of your initial query with ARRAY_AGG replaced with STRING_AGG
#standardSQL
SELECT yearquarter, gender, generation, country,
(SELECT COUNT(DISTINCT id) FROM UNNEST(SPLIT(cumulative_users)) AS id) AS cumulative_number_of_users
FROM (
SELECT *,
STRING_AGG(users) OVER(PARTITION BY gender, generation, country ORDER BY yearquarter) AS cumulative_users
FROM (
SELECT
yearquarter, gender, generation, country,
STRING_AGG(DISTINCT user_id) AS users
FROM `project.dataset.table`
WHERE group_id= "mygroup"
GROUP BY yearquarter, gender, generation, country
)
)
-- ORDER BY yearquarter, gender, generation, country

SQL query to get instructors and students has invalid identifier

Hi I have a schema that look like this
I made two queries that had to do this:
Find the names of the top 4 instructors who have taught the most number of distinct courses. Display also the total number of courses taught.
Output columns: InstructorName, NumberOfCoursesTaught
Sort by: NumberOfCoursesTaught in descending order
Find the top 2 students who have taken the most number of courses.
Output columns: S_ID, StudentName, NumberOfCourses
Sort by: NumberOfCourses in descending order
For query 1, I wrote:
SELECT name AS InstructorName, count(course_id) AS NumberOfCourses
FROM Teaches
WHERE name IN (SELECT name FROM Instructor where Instructor.i_id = Teaches.i_id)
GROUP BU i_id
ORDER BY COUNT(course_id) DESC;
For query 2, I wrote
SELECT s_id as S_ID, name as StudentName, count(course_id) as NumberOfCourses
FROM Takes
WHERE name IN (SELECT name FROM Student WHERE Takes.s_id = Student.s_id)
GROUP BY s_id
ORDER BY COUNT(course_id) DESC;
Both say:
"NAME" Invalid identifier
I suggest that you should use another logic to build your queries. Here is a demonstration for the first query ; from there on, you should be able to create the second query (and maybe post it as an answer?).
Start with an aggregate query that computes the number of teaches per instructor id, looking at the Teaches table:
SELECT i_id, COUNT(*) cnt FROM Teaches GROUP BY i_id
Then rank each record by decreasing count, using window function ROW_NUMBER() :
SELECT i_id, cnt, ROW_NUMBER() OVER(ORDER BY cnt DESC) rn
FROM (SELECT i_id, COUNT(*) cnt FROM Teaches GROUP BY i_id) t
All that is left to do is get thte instructor name (JOIN ON Instructor) and filter in the top 4 records
SELECT i.name InstructorName, x.cnt NumberOfCoursesTaught
FROM (
SELECT i_id, cnt, ROW_NUMBER() OVER(ORDER BY cnt DESC) rn
FROM (SELECT i_id, COUNT(*) cnt FROM Teaches GROUP BY i_id) t
) x
INNER JOIN Instructor i ON i.i_id = x.i_id
WHERE x.rn <= 4
ORDER BY x.cnt desc

SQL Server query correction in nested query

SELECT campus,semester, AVG(CountOfStudents)
FROM
(
SELECT semester,year,campus, count(*) as CountOfStudents
FROM regestration
GROUP BY semester, year, campus,student_id
) t
GROUP BY campus,semester
I need to do is to find the average number of people per semester of each campus
My table structure is:
Table name - registration
student_id
campus
year
batch
semester
campus, year, semester and batch these can help identify unique records, where as student_id may repeat itself the query above gives wrong answer.
Follow these steps:
remove student_ID from the GROUP BY clause and
add DISTINCT inside COUNT()
query,
SELECT campus, semester, AVG(CountOfStudents)
FROM
(
SELECT semester, year, campus, count(DISTINCT student_id) as CountOfStudents
FROM registration
GROUP BY semester, year, campus
) t
GROUP BY campus, semester

SQl server query multiple aggregate columns

I need to write a query in sql server to data get like this.
Essentially it is group by dept, race, gender and then
SUM(employees_of_race_by_gender),Sum(employees_Of_Dept).
I could get data of first four columns, getting sum of employees in that dept is becoming difficult.
Could you pls help me in writing the query?
All these details in same table Emp. Columns of Emp are Emp_Number, Race_Name,Gender,Dept
Your "num_of_emp_in_race" is actually by Gender too
SELECT DISTINCT
Dept,
Race_name,
Gender,
COUNT(*) OVER (PARTITION BY Dept, Race_name, Gender) AS num_of_emp_in_race,
COUNT(*) OVER (PARTITION BY Dept) AS num_of_emp_dept
FROM
MyTable
You should probably have this
COUNT(*) OVER (PARTITION BY Dept, Gender) AS PerDeptRace
COUNT(*) OVER (PARTITION BY Dept, Race_name) AS PerDeptGender,
COUNT(*) OVER (PARTITION BY Dept, Race_name, Gender) AS PerDeptRaceGender,
COUNT(*) OVER (PARTITION BY Dept) AS PerDept
Edit: the DISTINCT appears to be applied before the COUNT (which would odd based on this) so try this instead
SELECT DISTINCT
*
FROM
(
SELECT
Dept,
Race_name,
Gender,
COUNT(*) OVER (PARTITION BY Dept, Race_name, Gender) AS num_of_emp_in_race,
COUNT(*) OVER (PARTITION BY Dept) AS num_of_emp_dept
FROM
MyTable
) foo
Since the two sums you're looking for are based on a different aggregation, you need to calculate them separately and join the result. In such cases I first build the selects to show me the different results, making it easy to catch errors early:
SELECT Dept, Gender, race_name, COUNT(*) as num_of_emp_in_race
FROM Emp
GROUP BY 1, 2, 3
SELECT Dept, COUNT(*) as num_of_emp_in_dept
FROM Emp
GROUP BY 1
Afterwards, joining those two is pretty straight forward:
SELECT *
FROM ( first statement here ) as by_race
JOIN ( second statement here ) as by_dept ON (by_race.Dept = by_dept.Dept)