postgres use of rank function

postgres use of rank function - sql

Am new to Sql queries. I need to build a query which will rank the student based on number of test on which he has got 100% divide by total number of test he has taken and consider only test which are 10 days old. Here is my table structure.
CREATE TABLE student(
id serial NOT NULL,student_email varchar NULL,
student_name varchar NULL,
test_subject varchar NULL,
total_question varchar NULL,
total_passed varchar NULL,
total_failed varchar NULL,
total_skipped varchar NULL,
test_time timestamp NULL,
CONSTRAINT student PRIMARY KEY (id));
if a student has total_failed or total_skipped not 0 then that test is not considered has 100%.
sample data will be like
1 j#b.com john maths 10 10 0 0 2019-08-20 21:00:00
2 j#b.com john maths 10 10 0 0 2019-08-19 21:00:00
3 j#b.com john maths 10 09 1 0 2019-08-18 21:00:00
4 j#b.com john english 10 10 0 0 2019-08-20 21:00:00
5 j#b.com john english 10 10 0 0 2019-08-19 21:00:00
6 j#b.com john english 10 09 0 1 2019-08-20 21:00:00
7 p#b.com paul maths 10 10 0 0 2019-08-20 21:00:00
8 p#b.com paul maths 10 10 0 0 2019-08-19 21:00:00
9 p#b.com paul maths 10 10 0 0 2019-08-18 21:00:00
10 k#b.com koki maths 10 10 0 0 2019-06-20 21:00:00
11 k#b.com koki english 10 10 0 0 2019-06-20 21:00:00
12 k#b.com koki science 10 10 0 0 2019-08-20 21:00:00
13 k#b.com koki maths 10 08 2 0 2019-08-20 21:00:00
14 k#b.com koki english 10 10 0 0 2019-08-20 21:00:00
from the above data set i need to consider only those data which are with in 10 days and give the "RANK" based on total number of test with 100% divided by total number of test for every distinct subject_name,student.
output of above dataset will be
koki science 100% k#b.com
koki english 100% k#b.com
paul maths 100% p#b.com
john maths 66.6% j#b.com
john english 66.6% j#b.com
koki science 0% k#b.com
Any help appreciated

When I
translate your definition of rank:
total number of test with 100% divided by total number of test
as
COUNT(*) FILTER (WHERE total_passed = total_question) / COUNT(*)::real
and apply the filter for the last 10 days:
test_time > CURRENT_DATE - interval '10 days'
I end up with the following query
SELECT
student_name
, test_subject
, COUNT(*) FILTER (WHERE total_passed = total_question) / COUNT(*)::real student_rank
FROM student
WHERE test_time > CURRENT_DATE - interval '10 days'
GROUP BY 1, 2
ORDER BY 3 desc;
I get the desired output:
student_name | test_subject | student_rank
--------------+--------------+-------------------
paul | maths | 1
koki | english | 1
koki | science | 1
john | maths | 0.666666666666667
john | english | 0.666666666666667
koki | maths | 0
(6 rows)

You can use conditional aggregation. In this case, avg() should work:
select student_name, test_subject,
avg( (total_failed + total_skipped = 0)::int ) as ratio_passed
from t
where test_time > now() - interval 10 day
group by student_name, test_subject;

Related

Filter rows of a table based on a condition that implies: 1) value of a field within a range 2) id of the business and 3) date?

I want to filter a TableA, taking into account only those rows whose "TotalInvoice" field is within the minimum and maximum values expressed in a ViewB, based on month and year values and RepairShopId (the sample data only has one RepairShopId, but all the data has multiple IDs).
In the view I have minimum and maximum values for each business and each month and year.
TableA
RepairOrderDataId
RepairShopId
LastUpdated
TotalInvoice
1
10
2017-06-01 07:00:00.000
765
1
10
2017-06-05 12:15:00.000
765
2
10
2017-02-25 13:00:00.000
400
3
10
2017-10-19 12:15:00.000
295679
4
10
2016-11-29 11:00:00.000
133409.41
5
10
2016-10-28 12:30:00.000
127769
6
10
2016-11-25 16:15:00.000
122400
7
10
2016-10-18 11:15:00.000
1950
8
10
2016-11-07 16:45:00.000
79342.7
9
10
2016-11-25 19:15:00.000
1950
10
10
2016-12-09 14:00:00.000
111559
11
10
2016-11-28 10:30:00.000
106333
12
10
2016-12-13 18:00:00.000
23847.4
13
10
2016-11-01 17:00:00.000
22782.9
14
10
2016-10-07 15:30:00.000
NULL
15
10
2017-01-06 15:30:00.000
138958
16
10
2017-01-31 13:00:00.000
244484
17
10
2016-12-05 09:30:00.000
180236
18
10
2017-02-14 18:30:00.000
92752.6
19
10
2016-10-05 08:30:00.000
161952
20
10
2016-10-05 08:30:00.000
8713.08
ViewB
RepairShopId
Orders
Average
MinimumValue
MaximumValue
year
month
yearMonth
10
1
370343
370343
370343
2015
7
2015-7
10
1
109645
109645
109645
2015
10
2015-10
10
1
148487
148487
148487
2015
12
2015-12
10
1
133409.41
133409.41
133409.41
2016
3
2016-3
10
1
19261
19261
19261
2016
8
2016-8
10
4
10477.3575
2656.65644879821
18298.0585512018
2016
9
2016-9
10
69
15047.709565
10
90942.6052417394
2016
10
2016-10
10
98
22312.077244
10
147265.581935242
2016
11
2016-11
10
96
20068.147395
10
99974.1750708773
2016
12
2016-12
10
86
25334.053372
10
184186.985160105
2017
1
2017-1
10
69
21410.63855
10
153417.00126689
2017
2
2017-2
10
100
13009.797
10
59002.3589332934
2017
3
2017-3
10
101
11746.191287
10
71405.3391452842
2017
4
2017-4
10
123
11143.49756
10
55306.8202091131
2017
5
2017-5
10
197
15980.55406
10
204538.144334771
2017
6
2017-6
10
99
10852.496969
10
63283.9899761938
2017
7
2017-7
10
131
52601.981526
10
1314998.61355187
2017
8
2017-8
10
124
10983.221854
10
59444.0535811233
2017
9
2017-9
10
115
12467.148434
10
72996.6054527277
2017
10
2017-10
10
123
14843.379593
10
129673.931373139
2017
11
2017-11
10
111
8535.455945
10
50328.1495501884
2017
12
2017-12
I've tried:
SELECT *
FROM TableA
INNER JOIN ViewB ON TableA.RepairShopId = ViewB.RepairShopId
WHERE TotalInvoice > MinimumValue AND TotalInvoice < MaximumValue
AND TableA.RepairShopId = ViewB.RepairShopId
But I'm not sure how to compare it the yearMonth field with the datetime field "LastUpdated".
Any help is very appreciated!

here is how you can do it:
I assumed LastUpdated column is the column from tableA which indicate date of
SELECT *
FROM TableA A
INNER JOIN ViewB B
ON A.RepairShopId = B.RepairShopId
AND A.TotalInvoice > B.MinimumValue
AND A.TotalInvoice < B.MaximumValue
AND YEAR(LastUpdated) = B.year
AND MONTH(LastUpdated) = B.month

Select statement for overlapping dates

I need a SELECT query that returns the RoomID's of rows in which the dates overlap each other, ex.
Client ID 10 and 6 arrive on different days, but they are assigned to the same room during their stay at the hotel.
RoomID ArrivalDate DepartureDate ClientID
2 2020-11-02 2021-11-10 10
2 2021-11-01 2021-11-11 6
4 2021-10-18 2021-10-20 4
4 2021-12-13 2021-12-21 11
4 2021-12-14 2021-12-21 12
8 2021-12-10 2021-12-19 8
9 2021-09-20 2021-09-25 2
9 2021-09-21 2021-09-25 1
9 2021-12-10 2021-12-15 7
10 2021-10-19 2021-10-26 5
11 2021-10-02 2021-10-10 3
11 2021-12-12 2021-12-18 9
12 2021-10-04 2021-10-09 2
CREATE DATABASE Hotel;
CREATE TABLE reservations (
roomID INT NOT NULL,
ArrivalDate DATE NOT NULL,
DepartureDate DATE NOT NULL,
clientID INT NOT NULL,
PRIMARY KEY (roomID, ArrivalDate),
CHECK (ArrivalDate <= DepartureDate)
);
I appreciate any help.

You can get overlaps using exists:
select t.*
from t
where exists (select 1
from t t2
where t2.RoomID = t.RoomId and
t2.ClientID <> t.ClientId and
t2.ArrivalDate < t.DepartureDate and
t2.DepartureDate > t.ArrivalDate
);

Count median days per ID between one zero and the first transaction after the last zero in a running balance

I have a running balance sheet showing customer balances after inflows and (outflows) by date. It looks something like this:
ID DATE AMOUNT RUNNING AMOUNT
-- ---------------- ------- --------------
10 27/06/2019 14:30 100 100
10 29/06/2019 15:26 -100 0
10 03/07/2019 01:56 83 83
10 04/07/2019 17:53 15 98
10 05/07/2019 15:09 -98 0
10 05/07/2019 15:53 98.98 98.98
10 05/07/2019 19:54 -98.98 0
10 07/07/2019 01:36 90.97 90.97
10 07/07/2019 13:02 -90.97 0
10 07/07/2019 16:32 39.88 39.88
10 08/07/2019 13:41 50 89.88
20 08/01/2019 09:03 890.97 890.97
20 09/01/2019 14:47 -91.09 799.88
20 09/01/2019 14:53 100 899.88
20 09/01/2019 14:59 -399 500.88
20 09/01/2019 18:24 311 811.88
20 09/01/2019 23:25 50 861.88
20 10/01/2019 16:18 -861.88 0
20 12/01/2019 16:46 894.49 894.49
20 25/01/2019 05:40 -871.05 23.44
I have attempted using lag() but I seem not to understand how to use it yet.
SELECT ID, MEDIAN(DIFF) MEDIAN_AGE
FROM
(
SELECT *, DATEDIFF(day, Lag(DATE, 1) OVER(ORDER BY ID), DATE
)AS DIFF
FROM TABLE 1
WHERE RUNNING AMOUNT = 0
)
GROUP BY ID;
The expected result would be:
ID MEDIAN_AGE
-- ----------
10 1
20 2
Please help in writing out the query that gives the expected result.

As already pointed out, you are using syntax that isn't valid for Oracle, including functions that don't exist and column names that aren't allowed.
You seem to want to calculate the number of days between a zero running-amount and the following non-zero running-amount; lead() is probably easier than lag() here, and you can use a case expression to only calculate it when needed:
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table;
ID DATE_ AMOUNT RUNNING_AMOUNT DIFF
---------- -------------------- ---------- -------------- ----------
10 2019-06-27 14:30:00 100 100
10 2019-06-29 15:26:00 -100 0 3.4375
10 2019-07-03 01:56:00 83 83
10 2019-07-04 17:53:00 15 98
10 2019-07-05 15:09:00 -98 0 .0305555556
10 2019-07-05 15:53:00 98.98 98.98
10 2019-07-05 19:54:00 -98.98 0 1.2375
10 2019-07-07 01:36:00 90.97 90.97
10 2019-07-07 13:02:00 -90.97 0 .145833333
10 2019-07-07 16:32:00 39.88 39.88
10 2019-07-08 13:41:00 50 89.88
20 2019-01-08 09:03:00 890.97 890.97
20 2019-01-09 14:47:00 -91.09 799.88
20 2019-01-09 14:53:00 100 899.88
20 2019-01-09 14:59:00 -399 500.88
20 2019-01-09 18:24:00 311 811.88
20 2019-01-09 23:25:00 50 861.88
20 2019-01-10 16:18:00 -861.88 0 2.01944444
20 2019-01-12 16:46:00 894.49 894.49
20 2019-01-25 05:40:00 -871.05 23.44
Then use the median() function, rounding if desired to get your expected result:
select id, median(diff) as median_age, round(median(diff)) as median_age_rounded
from (
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table
)
group by id;
ID MEDIAN_AGE MEDIAN_AGE_ROUNDED
---------- ---------- ------------------
10 .691666667 1
20 2.01944444 2
db<>fiddle

How do I aggregate hourly data to daily, weekly, monthly and yearly values in PostgreSQL?

I have a table my_data in my PostgreSQL 9.5 database containing hourly data. The sample data is like:
ID Date hour value
1 01/01/2014 1 9.947484
2 01/01/2014 2 9.161652
3 01/01/2014 3 8.509986
4 01/01/2014 4 7.666654
5 01/01/2014 5 7.110822
6 01/01/2014 6 6.765822
7 01/01/2014 7 6.554989
8 01/01/2014 8 6.574156
9 01/01/2014 9 6.09499
10 01/01/2014 10 8.471653
11 01/01/2014 11 11.36581
12 01/01/2014 12 11.25081
13 01/01/2014 13 9.391651
14 01/01/2014 14 6.976655
15 01/01/2014 15 6.574156
16 01/01/2014 16 6.420823
17 01/01/2014 17 6.229156
18 01/01/2014 18 5.577491
19 01/01/2014 19 4.964159
20 01/01/2014 20 6.593323
21 01/01/2014 21 7.321654
22 01/01/2014 22 9.295818
23 01/01/2014 23 8.241653
24 01/01/2014 24 7.014989
25 02/01/2014 1 6.842489
26 02/01/2014 2 7.513321
27 02/01/2014 3 7.244988
28 02/01/2014 4 5.80749
29 02/01/2014 5 5.481658
30 02/01/2014 6 6.669989
.. .. .. ..
and so on. The data exist for many years in the same manner. Structure of above table is: ID (integer serial not null), Date (date) (mm/dd/yyyy), hour (integer), value (numeric). For a large set of data like above, how do I find daily, weekly, monthly and yearly averages in PostgreSQL?

You use aggregation. For instance:
select date, avg(value)
from t
group by date
order by date;
For the rest, use date_trunc():
select date_trunc('month', date) as yyyymm, avg(value)
from t
group by yyyymm
order by yyyymm;
This assumes that date is stored as a date data type. If it is stored as a string you should fix the data type in your data. You can convert it to a date using to_date().

I need help in sql query to get this o/p

I have a table structure as follows..
and here is sample data...
tblTeam
----------------------------------
Name TeamID
Royal Challengers Bangalore 1
Chennai Super Kings 2
Delhi Daredevils 3
Sunrisers Hyderabad 4
Kolkata Knight Riders 5
Mumbai Indians 6
Kings XI Punjab 7
Rajasthan Royals 8
Deccan Chargers 9
Kochi Tuskers Kerala 10
Pune Warriors 11
------------------------------------------------
tblSchedule
------------------------------------------------
ScheduleID DateTime Team_1 Team_2 VenuID
1 4/18/08 8:00 PM 1 5 6
2 4/19/08 5:00 PM 2 7 9
3 4/19/08 8:30 PM 3 8 4
4 4/20/08 4:30 PM 5 9 1
5 4/20/08 8:00 PM 1 6 5
6 4/21/08 8:00 PM 8 7 27
7 4/22/08 8:00 PM 3 9 10
8 4/23/08 8:00 PM 2 6 2
9 4/24/08 8:00 PM 8 9 10
10 4/25/08 8:00 PM 6 7 9
11 4/26/08 4:00 PM 5 2 2
12 4/26/08 8:00 PM 1 8 6
-----------------------------------------------
The yellow key in the pic denote primary key and blue one foreign key.
and my requirement is like this....
DateTime Team-1 Team-2
Apr 8, 2015 8:00:00 PM Kolkata Knight Riders Mumbai Indians
Please help to get that o/p...

Join tblTeam twice with different alias names (T1 & T2):
SELECT ScheduleID,DateTime,T1.Name as [Team-1],T2.Name as [Team-2]
FROM tblSchedule S JOIN
tblTeam T1 ON S.Team_1=T1.TeamID JOIN
tblTeam T2 ON S.Team_2=T2.TeamID
ORDER BY S.ScheduleID
Sample Result:
ScheduleID DateTime Team-1 Team-2
----------------------------------------------------------------------------------------
1 April, 18 2008 20:00:00 Royal Challengers Bangalore Kolkata Knight Riders
2 April, 19 2008 17:00:00 Chennai Super Kings Kings XI Punjab
3 April, 19 2008 20:30:00 Delhi Daredevils Rajasthan Royals
4 April, 20 2008 16:30:00 Kolkata Knight Riders Deccan Chargers
5 April, 20 2008 20:00:00 Royal Challengers Bangalore Mumbai Indians
Sample result in SQL Fiddle

I like to use subquery for this type of problem to avoid the extra joining
product.
SELECT
CONVERT(varchar(20), DateTime, 100) AS DateTime,
(SELECT Name FROM tblTeam WHERE s.Team_1 = TeamID) AS Team-1,
(SELECT Name FROM tblTeam WHERE s.Team_2 = TeamID) AS Team-2
FROM tblSchedule s
Extra reading

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

postgres use of rank function - sql

You can use conditional aggregation. In this case, avg() should work: select student_name, test_subject, avg( (total_failed + total_skipped = 0)::int ) as ratio_passed from t where test_time > now() - interval 10 day group by student_name, test_subject;

Related

Filter rows of a table based on a condition that implies: 1) value of a field within a range 2) id of the business and 3) date?

Select statement for overlapping dates

Count median days per ID between one zero and the first transaction after the last zero in a running balance

How do I aggregate hourly data to daily, weekly, monthly and yearly values in PostgreSQL?

I need help in sql query to get this o/p

Categories

Resources