percentile_disc vs percentile_cont

percentile_disc vs percentile_cont - sql

What is difference between PERCENTILE_DISC and PERCENTILE_CONT,
I have a table ### select * from childstat
FIRSTNAME GENDER BIRTHDATE HEIGHT WEIGHT
-------------------------------------------------- ------ --------- ---------- ----------
lauren f 10-JUN-00 54 876
rosemary f 08-MAY-00 35 123
Albert m 02-AUG-00 15 923
buddy m 02-OCT-00 15 150
furkar m 05-JAN-00 76 198
simon m 03-JAN-00 87 256
tommy m 11-DEC-00 78 167
And I am trying differentiate between those percentile
select firstname,height,
percentile_cont(.50) within group (order by height) over() as pctcont_50_ht,
percentile_cont(.72) within group (order by height) over() as pctcont_72_ht,
percentile_disc(.50) within group (order by height) over () as pctdisc_50_ht,
percentile_disc(.72) within group (order by height) over () as pctdisc_72_ht
from childstat order by height
FIRSTNAME HEIGHT PCTCONT_50_HT PCTCONT_72_HT PCTDISC_50_HT PCTDISC_72_HT
-------------------------------------------------- ---------- ------------- ------------- ------------- -------------
buddy 15 54 76.64 54 78
Albert 15 54 76.64 54 78
rosemary 35 54 76.64 54 78
lauren 54 54 76.64 54 78
furkar 76 54 76.64 54 78
tommy 78 54 76.64 54 78
simon 87 54 76.64 54 78
But still can't understand how this two and what is use of those two functions..

PERCENTILE_DISC returns a value in your set/window, whereas PERCENTILE_CONT will interpolate;
In your query, when you use .72, PERCENTILE_CONT interpolates between 76 and 78, since 72% is neither one of them; PERCENTILE_DISC chooses 76 (the lowest of the ones)

I found this explanation very helpful
http://mfzahirdba.blogspot.com/2012/09/difference-between-percentilecont-and.html
ITEM REGION WK FORECASTQTY
---- ---------- ---------- -----------
TEST E 3 137
TEST E 2 190
TEST E 1 232
TEST E 4 400
SELECT
t.* ,
PERCENTILE_CONT(0.5)
WITHIN GROUP ( ORDER BY forecastqty)
OVER (PARTITION BY ITEM , region ) AS PERCENTILE_CONT ,
MEDIAN(forecastqty)
OVER (PARTITION BY ITEM , region ) AS MEDIAN ,
PERCENTILE_DISC(0.5)
WITHIN GROUP ( ORDER BY forecastqty)
OVER (PARTITION BY ITEM , region ) AS PERCENTILE_DISC
FROM
t ;
ITEM REGION WK FORECASTQTY PERCENTILE_CONT MEDIAN PERCENTILE_DISC
---- ---------- ---------- ----------- --------------- ---------- ---------------
TEST E 3 137 211 211 190
TEST E 2 190 211 211 190
TEST E 1 232 211 211 190
TEST E 4 400 211 211 190

Related

Summing column that is grouped - SQL

I have a query:
SELECT
date,
COUNT(o.row_number)FILTER (WHERE o.row_number > 1 AND date_ddr IS NOT NULL AND telephone_number <> 'Anonymous' ) repeat_calls_24h
(
SELECT
telephone_number,
date_ddr,
ROW_NUMBER() OVER(PARTITION BY ddr.telephone_number ORDER BY ddr.date) row_number,
FROM
table_a
)o
GROUP BY 1
Generating the following table:
date
Repeat calls_24h
17/09/2022
182
18/09/2022
381
19/09/2022
81
20/09/2022
24
21/09/2022
91
22/09/2022
110
23/09/2022
231
What can I add to my query to provide a sum of the previous three days as below?:
date
Repeat calls_24h
Repeat Calls 3d
17/09/2022
182
18/09/2022
381
19/09/2022
81
644
20/09/2022
24
486
21/09/2022
91
196
22/09/2022
110
225
23/09/2022
231
432
Thanks

We can do it using lag.
select "date"
,"Repeat calls_24h"
,"Repeat calls_24h" + lag("Repeat calls_24h") over(order by "date") + lag("Repeat calls_24h", 2) over(order by "date") as "Repeat Calls 3d"
from t
date
Repeat calls_24h
Repeat Calls 3d
2022-09-17
182
null
2022-09-18
381
null
2022-09-19
81
644
2022-09-20
24
486
2022-09-21
91
196
2022-09-22
110
225
2022-09-23
231
432
Fiddle

Top 5 records using SQL query

I'm trying to retrieve top 5 marks from school database in databricks deltatable using SQL query. So I wrote following query
select
rs.State, rs.Year, rs.CW, rs.Country, rs.SchoolName,
rs.EducationSystem, rs.MarksS1, rs.MarksS2, rs.MarksS3, rs.MarksS4,
rs.TotalMarks, rs.group_rank
from
(select
State, Year, Country, SchoolName, EducationSystem,
MarksS1, MarksS2, MarksS3, MarksS4, TotalMarks,
row_number() over (partition by State, Year, Country, SchoolName, EducationSystem
order by TotalMarks DESC, MarksS1 ASC) as group_rank
from
nm_combined_historical_data_distinct) rs
where
group_rank < 6
I'm trying to get top 5 students based on marks. But if top 5 students have same marks, I want my output like this
State year Country School Name Education System MarksS1 MarksS2 MarksS3 MarksS4 Total
AZ 2020 US XYZ ABC 95 91 92 95 373
AZ 2020 US XYZ ABC 95 91 92 95 373
AZ 2020 US XYZ ABC 95 91 92 95 373
AZ 2020 US XYZ ABC 95 91 92 95 373
AZ 2020 US XYZ ABC 95 91 92 95 373
But my output is coming in this way
State year Country School Name Education System MarksS1 MarksS2 MarksS3 MarksS4 Total
AZ 2020 US XYZ ABC 95 91 92 95 373
Can you suggest me how do I get my desired output

Instead of row_number() use rank():
rank() over (partition by State, Year, Country, SchoolName, EducationSystem
order by TotalMarks DESC, MarksS1 ASC
) as group_rank

how to order student rank on the basis of obtain marks on different subject in sql

Here is our table
name math physics chemistry hindi english
pk 85 65 45 54 40
ashis 87 44 87 78 74
rohit 77 47 68 63 59
mayank 91 81 78 47 84
komal 47 51 73 61 55
we want to result show as (summing the grades essentially)
rank name total
1 mayank 381
2 ashis 370
3 rohit 314
4 pk 289
5 komal 287

SET #rank=0;
SELECT #rank:=#rank+1 AS rank,name,(math+physics+chemistry+hindi+english) as total
FROM tablename ORDER BY total DESC
this will produce your desired result as
rank | name | total
--------------------
1 | mayank | 381
2 | ashis | 370
for more details take a look mysql ranking results

Try this
SELECT #curRank := #curRank + 1 AS rank, name, (math + physics + chemistry + hindi + history) AS total FROM table, (SELECT #curRank := 0) r ORDER BY total DESC;
This will sum all the fields and sort them by descending order and add a rank.
By doing SELECT #curRank := 0 you can keep it all in one SQL statement without having to do a SET first.

how to update duplicate rows in a column to a new values

I will explain my problem briefly
have duplicate rino like below (actually this rino is the serial number in front end)
chqid rino branchid
----- ---- --------
876 6 2
14 6 2
18 10 2
828 10 2
829 11 2
19 11 2
830 12 2
20 12 2
78 40 2
1092 40 2
1094 41 2
79 41 2
413 43 2
1103 43 2
82 44 2
1104 44 2
1105 45 2
83 45 2
91 46 2
1106 46 2
here in my case I don't want to delete these duplicate rino instead of that I planned to update the rino having max date(column not specified in the above sample actually a date column is there) to the next rino number
what exactly I meant is :
if I sort out the above result according to the max(date) I will get
chqid rino branchid
----- ---- --------
876 6 2
828 10 2
19 11 2
830 12 2
1092 40 2
79 41 2
413 43 2
82 44 2
83 45 2
1106 46 2
(NOTE : total number of duplicate rows are 10 in branchid=2)
the last entered rino in the table for branchid=2 is 245
So I just want to update the 10 rows(Of column rino) with numbers starting from 246 to 255( just added 245+10 like this select lastno+ generate_series(1,10) nos from tab where cola=4 and branchid = 2 and vrid=20;)
Expected Output:
chqid rino branchid
----- ---- --------
876 246 2
828 247 2
19 248 2
830 249 2
1092 250 2
79 251 2
413 252 2
82 253 2
83 254 2
1106 255 2
using postgresql

Finally I found a solution, am using dynamic-sql to solve my issue
do
$$
declare
arow record;
begin
for arow in
select chqid,rino,branchid from (
select chqid,rino::int ,vrid,branchid ,row_number()over (partition by rino::int ) rn
from tab
where vrid =20
and branchid = 2)t
where rn >1
loop
execute format('
update tab
set rino=(select max(rino::int)+1 from gtab19 where acyrid=4 and branchid = 2 and vrid=20)
where chqid=%s
',arow.chqid);
end loop;
end;
$$;

adding columns to get total and ranking total

this my table
student_numbers
ROLL_NO NAME CLASS HINDI MATHS SCIENCE
2 amit 11 91 91 81
3 anirudh 11 88 87 81
4 akash 11 82 81 85
5 pratik 10 81 99 98
7 rekha 10 79 97 82
6 neha 10 89 91 90
8 kamal 10 66 68 69
1 ankit 11 97 98 87
i want to add last three columns and rank on that total partitioned by class
this is what i tried
select roll_no,name,class,total,
rank() over (partition by class order by total desc) as rank
from student_numbers,(select hindi+maths+science total from student_numbers)
;
but this is showing a very large table,with duplicate student name having different total .

I'm not exactly sure what you are trying to accomplish -- order the highest grades by class? If so, something like this should work:
SELECT SN.Roll_No,
SN.Class,
SN2.Total,
RANK() OVER (PARTITION BY SN.Class ORDER BY SN2.Total DESC) as rank
FROM Student_Numbers SN
JOIN (
SELECT
Roll_no, hindi+maths+science as Total
FROM Student_Numbers
) SN2 ON SN.Roll_No = SN2.Roll_No
Here is the SQL Fiddle.
Good luck.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

percentile_disc vs percentile_cont - sql

PERCENTILE_DISC returns a value in your set/window, whereas PERCENTILE_CONT will interpolate; In your query, when you use .72, PERCENTILE_CONT interpolates between 76 and 78, since 72% is neither one of them; PERCENTILE_DISC chooses 76 (the lowest of the ones)

Related

Summing column that is grouped - SQL

Top 5 records using SQL query

how to order student rank on the basis of obtain marks on different subject in sql

how to update duplicate rows in a column to a new values

adding columns to get total and ranking total

Categories

Resources