Convert table into grouped statistics of same table - sql

In MS-SQL, I have a table hasStudied(sid, ccode, grade) (student id, course code, grade) which keeps track of the past courses a student has studied and the grade they've gotten.
As output of my query, I want to return a list of courses, with the percentage of passing (= not 'F') students in the column next to it, in descending order by that percentage, like this:
C1 : 85
C3 : 70
C2 : 67
etc.
I have currently managed to break them into two separate tables, one containing coursecode and the number of people passing the course, one containing coursecode and the number of people who have read the course.
This is done by two relatively simple statements, but requires me to do a lot of inefficient calculating in java.
Is there any way to make this in a single query?

Assuming you do not have two entries with the same student under one course, this should do it:
SELECT
ccode,
ROUND((passed::numeric(15,2) / taken_course::numeric(15,2)) * 100, 0) AS percentage_passed
FROM(
SELECT
ccode,
sum(CASE WHEN grade > 2 THEN 1 ELSE 0 END) AS passed,
count(1) AS taken_course
FROM
hasStudied
GROUP BY ccode
) foo
ORDER BY ccode
-- since you want to order DESC by values, instead do
-- ORDER BY percentage_passed

I think you are looking for the usage of cte:
create table #temp(StId int, ccode varchar(5), grade varchar(1))
insert into #temp Values (1,'A1','A'),(1,'A1','F'),(2,'A2','B'),(3,'A2','F'),(4,'A2','F'),(4,'A3','F'),(5,'A3','F')
;with cte as (
select ccode
from #temp
group by ccode
)
select cte.ccode,ratioOfPass = cast(sum(case when t.grade <> 'F' then 1.0 else 0.0 end) as float) / count(*)
from cte
inner join #temp t on t.ccode = cte.ccode
group by cte.ccode
While calculating, use sum with case-when and do not forget to cast the value of sum to float.

Related

SQL How to select data that has all values above a value

Say I have a Table called "MARKS" with the columns: Value, subject_id and student_id.
Now I want to write a query to display the names of all students who have secured more than 50 in ALL subjects that they have appeared in.
How can that be achieved?
Example:
Lets say the subjects are maths, english, history.
Even if a student scores 100 in maths, 100 in english but 40 in history, he should be considered as failed and not be displayed.
There are several ways to get what you expect, but in the simplest case the HAVING clause may help. In the following query grouping is done by student_id, so the min function gets minimal value over all subjects for each student_id:
SELECT student_id
FROM marks_table
GROUP BY student_id
HAVING min(marks) > 50;
Then join student names by student_id.
I would say:
select student_id
from table
where student_id not in (
select student_id
from table
where value < 50
)
Beware, if you have nulls in student_id you'll receive incorrect results. Geta round this with a coalesce() in the sub-select
Returns all students with subjects appeared
select student_id,subject_id, value from marks
where
(case when value < 50 then 'failed' else 'pass' end ) = 'pass'
select *
from
(
select student_id, MIN(Value) as sum_value
from MARKS
group by student_id
) summed
where
sum_value > 50

query optimization (nested subqueries)

I try to simplify below subqueries to improve select statement. I have table with 3 basic columns as ID, GRAGE and AGE. To select all records which have GRADE same as GRADE of Maximum ID
Might somebody have better way that create nested subqueries, welcome all your suggestions?
Note: My apologise for formatting the table
ID GRADE AGE
10 A 30
12 B 45
13 A 15
09 B 14
20 A 12
SELECT
*
FROM
TABLE
WHERE
GRADE = (
SELECT
grade
FROM
TABLE
WHERE
id = (SELECT MAX(id) FROM TABLE)
);
You could use a CTE to make the query easier to read:
WITH cte AS
(
SELECT GRADE,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ID DESC) RowNum
FROM yourTable
)
SELECT *
FROM yourTable
WHERE GRADE = (SELECT t.GRADE FROM cte t WHERE t.RowNum = 1)
However, I don't have a problem with your original approach because the subqueries are not correlated to anything. What I mean by this is that
SELECT MAX(id) FROM yourTable
should effectively only be executed once, and afterwards sort of be treated as a constant. Similarly, the query
SELECT grade FROM TABLE WHERE id = (max from above query)
should also be executed only once. This assumes that the query optimizer is smart enough to figure this out, which it probably is.
You can do the following (not much simpler though):
SELECT
*
FROM
TABLE
WHERE
GRADE IN (
SELECT
first_value (GRADE) over (ORDER BY id DESC)
FROM
TABLE
)

Pivot for the same gender

How do i have multiple pivot. I would like to achieve the result as highlighted below.
For each Grade and each Gender, i would like to have the TotalA and Total B values aligned in 4 columns in a single row. My final result need to contain all 10 columns shown below.
My desired output [Need to contain 2 rows with GENDER column remained]:
I tried with below: But the script removed my Gender column and unable to pivot 2 columns (TotalA, TotalB) into 4 additional columns at the same time.
SELECT *,
[TotalA_Male] = [M],
[TotalB_Female] = [F]
FROM
(
SELECT * FROM table) AS s
PIVOT
(
MAX(TotalA) FOR [Gender] IN ([M],[F])
) AS p
I don't think you want a pivot at all. You are looking to find the partial sum of your total column, grouped by some key columns (it looks like Country and Grade in this case) . Window functions let you perform this partial sum. However, they won't filter by gender. You'll need to use a CASE expression inside the SUM() to only include male or female in your partial sums:
SELECT *,
SUM(CASE WHEN Gender = 'M' THEN TotalA ELSE 0 END) OVER(PARTITION BY Country, Grade) AS TotalA_Male,
SUM(CASE WHEN Gender = 'F' THEN TotalA ELSE 0 END) OVER(PARTITION BY Country, Grade) AS TotalA_Female,
SUM(CASE WHEN Gender = 'M' THEN TotalB ELSE 0 END) OVER(PARTITION BY Country, Grade) AS TotalB_Male,
SUM(CASE WHEN Gender = 'F' THEN TotalB ELSE 0 END) OVER(PARTITION BY Country, Grade) AS TotalB_Female
FROM totals
See also: https://msdn.microsoft.com/en-us/library/ms189461.aspx
Basically, the window functions let you do a GROUP BY as part of a single column expression in the SELECT list. The result of the aggregate and group by is included in every row, just as if it were any other expression. Note how there is no GROUP BY or PIVOT in the rest of the query. The PARTITION BY in the OVER() clause works like a GROUP BY, specifying how to group the rows in the resultset for the purposes of performing the specified aggregation (in this case, SUM()).
You can only pivot on a single column so what you need to to is unpivot those TotalA and TotalB columns into rows and then generate a single column based on gender and the total and use that in a pivot...
select * from (
select
grade,
/* combine the columns for a pivot */
total_gender_details = details + '_' + gender,
totals
from
(values
(1, 'F', cast(7.11321 as float), cast(15.55607 as float)),
(1, 'M', 6.31913, 15.50801),
(2, 'F', 5.26457, 6.94687),
(2, 'M', 6.34666, 9.29783)
) t(grade,gender,totalA,totalB)
/* unpivot the totals into rows */
unpivot (
totals
for details in ([totalA], [totalB])
) up
) t
pivot (
sum(totals)
for total_gender_details in ([totalA_M],[totalA_F],[totalB_M],[totalB_F])
) p

Keeping Unique Rows with Group By Cube

Suppose I have data that includes the SSN of a student, the college campus they attended, and their wages for a given year. Like so...
create table #thetable (SSN int, campus int, wage int);
insert into #thetable(SSN, campus, wage)
values
(111111111,1,100),
(111111111,2,100),
(222222222,1,250),
(222222222,2,250),
(333333333,1,50),
(444444444,2,400);
Now, I want to get the average wage of the students at each campus, and the average wage of students from all campuses put together... So I do something like this:
select campus, avg(wage)
from #thetable
group by cube(campus);
The problem is that I don't want to double-count the students who attended two campuses when I'm grouping the campuses together. This is the output I'm getting (double counts students 111111111 and 2222222222):
Campus (no column name)
1 133
2 250
NULL 191
My desired output is this (no double counting):
Campus (no column name)
1 133
2 250
NULL 200
Can this be accomplished without using multiple queries and the UNION operator? If so, how? (Incidentally, I realize that this table is not normalized... would normalizing help?)
You can't do this with one column. The cube is going to rollup the values based on the calculations on each line. So, if a row is included in one calculation, it will be included in the sum.
You can do this, though, by weighting the values by 1 divided by the frequency. This "divides" a student equally across the campuses to each adds to 1:
select campus, avg(wage) as avg_wage, sum(wage*weight) / sum(weight) avg_wage_weighted
from (select t.*, (1.0 / count(*) over (partition by SSN)) as weight
from #thetable t
) t
group by cube(campus);
The second column should be the value you want. You can then embed this further in a subquery to get one column:
select campus, (case when campus is null then avg_wage_weighted else avg_wage end)
from (select campus, avg(wage) as avg_wage, sum(wage*weight) / sum(weight) avg_wage_weighted
from (select t.*, (1.0 / count(*) over (partition by SSN)) as weight
from #thetable t
) t
group by cube(campus)
) t
Here is a SQL Fiddle showing the solution.
Figured it out with a correlated sub-query. Works for me.
select campus,
(
select avg(wage)
from
(
select ssn, campus, wage, row_number() over(partition by SSN order by wage) as RN
from #thetable as inside
where (inside.campus=outside.campus or outside.campus is null)
) as middle
where RN=1
)
from #thetable outside
group by cube(campus);

Avg Sql Query Always Returns int

I have one column for Farmer Names and one column for Town Names in my table TRY.
I want to find Average_Number_Of_Farmers_In_Each_Town.
Select TownName ,AVG(num)
FROM(Select TownName,Count(*) as num From try Group by TownName) a
group by TownName;
But this query always returns int values. How can i get values in float too?
;WITH [TRY]([Farmer Name], [Town Name])
AS
(
SELECT N'Johny', N'Bucharest' UNION ALL
SELECT N'Miky', N'Bucharest' UNION ALL
SELECT N'Kinky', N'Ploiesti'
)
SELECT AVG(src.Cnt) AS Average
FROM
(
SELECT COUNT(*)*1.00 AS Cnt
FROM [TRY]
GROUP BY [TRY].[Town Name]
) src
Results:
Average
--------
1.500000
Without ... *1.00 the result will be (!) 1 (AVG(INT 2 , INT 1) -truncated-> INT 1, see section Return types).
Your query is always returning int logically because the average is not doing anything. Both the inner and the outer queries are grouping by town name -- so there is one value for each average, and that average is the count.
If you are looking for the overall average, then something like:
Select AVG(cast(cnt as float))
FROM (Select TownName, Count(*) as cnt
From try
Group by TownName
) t
You can also do this without the subquery as:
select cast(count(*) as float) /count(distinct TownName)
from try;
EDIT:
The assumption was that each farmer in the town has one row in try. Are you just trying to count the number of distinct farmers in each town? Assuming you have a field like FarmerName that identifies a given farmer, that would be:
select TownName, count(distinct FarmerName)
from try
group by TownName;