Do calculations in bigquery within row and median - is this possible? - google-bigquery

My problem is getting a raw set of sensorydata, that needs some processing before I can use it. loading the data to client and do the processing is pretty slow, so looking for possibilty to unload this logic to bigquery.
Imagine I have some constants for a set of sensors. They can change, but I have them when I want to do the query
A: 1, B: 2, C: 3, D: 2, E: 1, F: 2
Sensors are connected, I know what sensors are connected to each other. It has a meaning below.
A: BC
D: EF
This is a table with measurements per timestamp per sensor. Imagine thousands of rows.
TS A | B | C | D | E | F
01 10 | 20 | 10 | 10 | 15 | 10
02 11 | 10 | 20 | 20 | 10 | 10
03 12 | 20 | 10 | 10 | 12 | 11
04 13 | 10 | 10 | 20 | 15 | 15
05 11 | 20 | 10 | 15 | 14 | 14
06 10 | 20 | 10 | 10 | 15 | 12
I want to query ts 01 to ts 06 (in real it can be 1000's of rows again). I don't want it to return this raw data, but have it do some calculations:
First, for each row, i need to detract the constants, so row 01 would look like:
01 9 | 18 | 17 | 8 | 14 | 8
Then, BC need to have A detracted, and EF to have D detracted:
01 9 | 9 | 8 | 8 | 6 | 0
Last step, when I have all rows, I want to return rows, where each sensor has the median value of the proceding X rows for this sensor. So
TS A | B |
01 10 | 1 |
02 11 | 2 |
03 12 | 2 |
04 13 | 1 |
05 11 | 2 |
06 10 | 3 |
07 10 | 4 |
08 11 | 2 |
09 12 | 2 |
10 13 | 10 |
11 11 | 20 |
12 10 | 20 |
returns (for X is 4)
TS A | B |
//first 3 needed for median for 4th value
04 11.5 | etc | //median 10, 11, 12, 13
05 11.5 | etc | //median 11, 12, 13, 11
06 11.5 | etc | //median 12, 13, 11, 10
07 etc | etc |
08 etc | etc |
09 etc | etc |
10 etc | etc |
11 etc | etc |
12 etc | etc |
Getting the data to my server and do the calc is very slow, I am really wondering if we can get these amounts of data in bigQuery, so I am able to get a quick calculated set with my own settings of choice!
I do this in Node.js... but in BigQuery SQL.. i am lost.

Below is for BigQuery Standard SQL
If you would look for AVG values - this would be as "simple" as below
#standardSQL
WITH constants AS (
SELECT 1 val_a, 2 val_b, 3 val_c, 2 val_d, 1 val_e, 2 val_f
), temp AS (
SELECT ts,
a - val_a AS a,
b - val_b - a + val_a AS b,
c - val_c - a + val_a AS c,
d - val_d AS d,
e - val_e - d + val_d AS e,
f - val_f - d + val_d AS f
FROM `project.dataset.measurements`, constants
)
SELECT ts,
AVG(a) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) a,
AVG(b) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) b,
AVG(c) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) c,
AVG(d) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) d,
AVG(e) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) e,
AVG(f) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) f
FROM temp
For MEDIAN you need to add a little extras - like in below example
#standardSQL
WITH constants AS (
SELECT 1 val_a, 2 val_b, 3 val_c, 2 val_d, 1 val_e, 2 val_f
), temp AS (
SELECT ts,
a - val_a AS a,
b - val_b - a + val_a AS b,
c - val_c - a + val_a AS c,
d - val_d AS d,
e - val_e - d + val_d AS e,
f - val_f - d + val_d AS f
FROM `project.dataset.measurements`, constants
)
SELECT ts,
(SELECT PERCENTILE_CONT(a, 0.5) OVER() FROM UNNEST(a) a LIMIT 1) a,
(SELECT PERCENTILE_CONT(b, 0.5) OVER() FROM UNNEST(b) b LIMIT 1) b,
(SELECT PERCENTILE_CONT(c, 0.5) OVER() FROM UNNEST(c) c LIMIT 1) c,
(SELECT PERCENTILE_CONT(d, 0.5) OVER() FROM UNNEST(d) d LIMIT 1) d,
(SELECT PERCENTILE_CONT(e, 0.5) OVER() FROM UNNEST(e) e LIMIT 1) e,
(SELECT PERCENTILE_CONT(f, 0.5) OVER() FROM UNNEST(f) f LIMIT 1) f
FROM (
SELECT ts,
ARRAY_AGG(a) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) a,
ARRAY_AGG(b) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) b,
ARRAY_AGG(c) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) c,
ARRAY_AGG(d) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) d,
ARRAY_AGG(e) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) e,
ARRAY_AGG(f) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) f
FROM temp
)
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.measurements` AS (
SELECT 01 ts, 10 a, 20 b, 20 c, 10 d, 15 e, 10 f UNION ALL
SELECT 02, 11, 10, 20, 20, 10, 10 UNION ALL
SELECT 03, 12, 20, 10, 10, 12, 11 UNION ALL
SELECT 04, 13, 10, 10, 20, 15, 15 UNION ALL
SELECT 05, 11, 20, 10, 15, 14, 14 UNION ALL
SELECT 06, 10, 20, 10, 10, 15, 12
), constants AS (
SELECT 1 val_a, 2 val_b, 3 val_c, 2 val_d, 1 val_e, 2 val_f
), temp AS (
SELECT ts,
a - val_a AS a,
b - val_b - a + val_a AS b,
c - val_c - a + val_a AS c,
d - val_d AS d,
e - val_e - d + val_d AS e,
f - val_f - d + val_d AS f
FROM `project.dataset.measurements`, constants
)
SELECT ts,
(SELECT PERCENTILE_CONT(a, 0.5) OVER() FROM UNNEST(a) a LIMIT 1) a,
(SELECT PERCENTILE_CONT(b, 0.5) OVER() FROM UNNEST(b) b LIMIT 1) b,
(SELECT PERCENTILE_CONT(c, 0.5) OVER() FROM UNNEST(c) c LIMIT 1) c,
(SELECT PERCENTILE_CONT(d, 0.5) OVER() FROM UNNEST(d) d LIMIT 1) d,
(SELECT PERCENTILE_CONT(e, 0.5) OVER() FROM UNNEST(e) e LIMIT 1) e,
(SELECT PERCENTILE_CONT(f, 0.5) OVER() FROM UNNEST(f) f LIMIT 1) f
FROM (
SELECT ts,
ARRAY_AGG(a) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) a,
ARRAY_AGG(b) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) b,
ARRAY_AGG(c) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) c,
ARRAY_AGG(d) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) d,
ARRAY_AGG(e) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) e,
ARRAY_AGG(f) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) f
FROM temp
)
-- ORDER BY ts
with result
Row ts a b c d e f
1 1 null null null null null null
2 2 9.0 9.0 8.0 8.0 6.0 0.0
3 3 9.5 3.5 7.5 13.0 -1.5 -5.0
4 4 10.0 7.0 7.0 8.0 3.0 0.0
5 5 10.5 2.5 1.5 13.0 -0.5 -2.5
6 6 10.5 2.5 -3.5 15.5 -2.0 -3.0

Related

Grab one record per ID with multiple Lead values

I have a table like this:
ID | Val | Quantity
----------------------
1 | A | 11
1 | B | 15
1 | B | 19
1 | Z | 45
2 | D | 4
2 | E | 25
2 | F | 13
2 | Y | 2
3 | G | 10
3 | H | 15
3 | I | 19
I want to select the top record for each ID ordered by VAL, Quantity AND add the next 2 Val/Quantity within the sort as columns to that row. My expected output look like this:
ID | Val | Quantity | VAL2 | Quantity2 | VAL3 | Quantity3
-------------------------------------------------------------------
1 | A | 11 | B | 15 | B | 19
2 | B | 15 | D | 4 | E | 25
3 | C | 19 | G | 10 | H | 15
I've almost done it using lead, but I don't know how to get rid of the rest of the records in my data-set, as I only want the top.
SELECT ID,
VAL,
Quantity,
lead(VAL,1) over (order by VAL, Quantity ASC) as Val2,
lead(Quantity,1) over (order by VAL, Quantity ASC) as Quantity2,
lead(VAL,2) over (order by VAL, Quantity ASC) as Val3,
lead(Quantity,2) over (order by VAL, Quantity ASC) as Quantity3,
FROM MY_TABLE
order by VAL, Quantity ASC
How can I only select the top record for each ID, while maintaining the lead records? Or is there a more elegant/efficient way to do this?
From your question it seems the expected output should actually be:
ID VAL QUANTITY VAL2 QUANTITY2 VAL3 QUANTITY3
1 A 11 B 15 B 19
2 D 4 E 25 F 13
3 G 10 H 15 I 19
You can get this result with a CTE which generates the LEAD values, as well as a ROW_NUMBER for each set of values. You can then select the first row for each ID from the CTE:
WITH CTE AS (
SELECT ID,
Val, Quantity,
LEAD(Val) OVER (PARTITION BY ID ORDER BY Val, Quantity) AS Val2,
LEAD(Quantity) OVER (PARTITION BY ID ORDER BY Val, Quantity) AS Quantity2,
LEAD(Val, 2) OVER (PARTITION BY ID ORDER BY Val, Quantity) AS Val3,
LEAD(Quantity, 2) OVER (PARTITION BY ID ORDER BY Val, Quantity) AS Quantity3,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Val, Quantity) AS rn
FROM MY_TABLE
)
SELECT ID, Val, Quantity, Val2, Quantity2, Val3, Quantity3
FROM CTE
WHERE rn = 1
Demo on SQLFiddle
You may use ROW_NUMBER to define the order of the rows within the same ID
with t as (
select ID, VAL, QUANTITY,
row_number() over (partition by ID order by VAL, QUANTITY) as rn
from tab)
select *
ID V QUANTITY RN
---------- - ---------- ----------
1 A 11 1
1 B 15 2
1 B 19 3
1 Z 45 4
2 D 4 1
...
In the next step use PIVOT to get the best three values in one row.
with t as (
select ID, VAL, QUANTITY,
row_number() over (partition by ID order by VAL, QUANTITY) as rn
from tab)
select *
from t
PIVOT
(MAX(VAL) as VAL, MAX(QUANTITY) as QUANTITY FOR RN IN (1 as "COL1" ,2 as "COL2",3 as "COL3")
)
ID C COL1_QUANTITY C COL2_QUANTITY C COL3_QUANTITY
---------- - ------------- - ------------- - -------------
1 A 11 B 15 B 19
2 D 4 E 25 F 13
3 G 10 H 15 I 19
If the standar pivot column naming is not fine, simple add a next query and rename the columns.
Note that this query return the same result as apternative approach base on multiple LEAD columns, but you get a bit better flexibility if you plan to vary the number of traced columns.

use preceding in calculation sql

I need to calculate the column E using column B,C,D & previous row of E... I have the sample statement and calculation for reference. Note that prev(E) is the preceding value of E which I need to use in calculation but am unable to.
+---------------------------------------------------------------------------------------------------------------------------------------+
| TransactionDt | total_allotment(B) | invchange(C) | roomssold_flag(D) | available(E) | samplestatement | calculation |
+---------------------------------------------------------------------------------------------------------------------------------------+
| 1/1/16 | 5 | 0 | null | 5 | E=case when D=null then B | 5 |
| 1/2/16 | 5 | 0 | 1 | 4 | E=case when C=0 then prev(E)-D | E=(5-1) |
| 1/3/16 | 5 | 0 | 0 | 4 | E=case when C=0 then prev(E)-D | E=(4-0) |
| 1/4/16 | 6 | 1 | 1 | 5 | E=case when C=1 then B-D | E=(6-1) |
| 1/5/16 | 6 | 0 | 0 | 5 | E=case when C=0 then prev(E)-D | E=(5-0) |
| 1/6/16 | 7 | 1 | 1 | 6 | E=case when C=1 then B-D | E=(7-1) |
+---------------------------------------------------------------------------------------------------------------------------------------+
You can use first_value() function with preceding clause to get privious value:
set dateformat dmy;
declare #t table (TransactionDt smalldatetime, b int, c int, d int, e int);
insert into #t (TransactionDt, b, c, d, e) values
(cast('01.01.2016' as date), 5, 0, null, 5),
(cast('02.01.2016' as date), 5, 0, 1, 4),
(cast('03.01.2016' as date), 5, 0, 0, 4),
(cast('04.01.2016' as date), 6, 1, 1, 5),
(cast('05.01.2016' as date), 6, 0, 0, 5),
(cast('06.01.2016' as date), 7, 1, 1, 6);
select
t.*
,first_value(t.e) over(order by t.TransactionDt asc rows 1 preceding) [prevE]
,case t.c
when 0 then
first_value(t.e)
over(order by t.TransactionDt asc rows 1 preceding)
- t.d
when 1 then
t.b - t.d
end [calculation]
from
#t t
order by
t.TransactionDt
;
Tested on MS SQL 2012.
I'm not big fan of Teradata, but this should work:
select
t.e
,sum(t.e)
over(order by t.TransactionDt asc rows between 1 preceding and 1 preceding) ePrev
,case t.c
when 0 then
sum(t.e)
over(order by t.TransactionDt asc rows between 1 preceding and 1 preceding)
- t.d
when 1 then
t.b - t.d
end calculation
from
(
select cast('01.01.2016' as date format 'dd.mm.yyyy') TransactionDt, 5 b, 0 c, null d, 5 e from (select 1 x) x
union all
select cast('02.01.2016' as date format 'dd.mm.yyyy') TransactionDt, 5 b, 0 c, 1 d, 4 e from (select 1 x) x
union all
select cast('03.01.2016' as date format 'dd.mm.yyyy'), 5, 0, 0, 4 from (select 1 x) x
union all
select cast('04.01.2016' as date format 'dd.mm.yyyy'), 6, 1, 1, 5 from (select 1 x) x
union all
select cast('05.01.2016' as date format 'dd.mm.yyyy'), 6, 0, 0, 5 from (select 1 x) x
union all
select cast('06.01.2016' as date format 'dd.mm.yyyy'), 7, 1, 1, 6 from (select 1 x) x
) t
order by
t.TransactionDt
;
When you need to restart the calculation whenever invchange=1 you have to create a group for partitioning using
sum(invchange)
over (order by TransactionDt
rows unbounded preceding) as grp
invchange seems to be based on a previous row query, so you need to nest it't calculation in a Dervied Table.
Now you it's the total_allotment value minus a Cumulative Sum over roomssold_flag:
select t.*,
b - sum(coalesce(D,0))
over (partition by grp
order by TransactionDt
rows unbounded preceding)
from
(
select TransactionDt,b,c,d,
sum(c) over (order by TransactionDt rows unbounded preceding) as grp
from t
) as t
Btw, using a 0/1 flag to get dynamic partitioning is similar to RESET WHEN

How to select ranges in a range of record in oracle

If I have a table like this
Number Status
------ ------
1 A
2 A
3 A
4 U
5 U
6 A
7 U
8 U
9 A
10 A
What query can I use to group the range into ranges where Status = A?
Range Count Status
----- ----- ------
1-3 3 A
6-6 1 A
9-10 2 A
My query is
select min(number) || '--' || max(number), count(*), Status
from table
where Status = 'A'
group by Status
Range Count Status
----- ----- ------
1-10 6 A
This is a nice way, fancy name "Tabibitosan method" given by Aketi Jyuuzou.
SQL> WITH data AS
2 (SELECT num - DENSE_RANK() OVER(PARTITION BY status ORDER BY num) grp,
3 status,
4 num
5 FROM t
6 )
7 SELECT MIN(num)
8 ||' - '
9 || MAX(num) range,
10 COUNT(*) cnt
11 FROM data
12 WHERE status='A'
13 GROUP BY grp
14 ORDER BY grp
15 /
RANGE CNT
------ ----------
1 - 3 3
6 - 6 1
9 - 10 2
SQL>
Note It is better to use DENSE_RANK to avoid duplicates.
Table
SQL> SELECT * FROM t ORDER BY num;
NUM S
---------- -
1 A
1 A
2 A
2 A
3 A
4 U
5 U
6 A
7 U
8 U
9 A
NUM S
---------- -
10 A
12 rows selected.
There are duplicates for num = 1.
Using DENSE_RANK:
SQL> WITH data AS
2 (SELECT num - DENSE_RANK() OVER(PARTITION BY status ORDER BY num) grp,
3 status,
4 num
5 FROM t
6 )
7 SELECT MIN(num)
8 ||' - '
9 || MAX(num) range,
10 COUNT(*) cnt
11 FROM data
12 WHERE status='A'
13 GROUP BY grp
14 ORDER BY grp
15 /
RANGE CNT
------ ----------
1 - 3 5
6 - 6 1
9 - 10 2
SQL>
Using ROW_NUMBER:
SQL> WITH DATA AS
2 (SELECT num - ROW_NUMBER() OVER(PARTITION BY status ORDER BY num) grp,
3 status,
4 num
5 FROM t
6 )
7 SELECT MIN(num)
8 ||' - '
9 || MAX(num) range,
10 COUNT(*) cnt
11 FROM data
12 WHERE status='A'
13 GROUP BY grp
14 ORDER BY grp
15 /
RANGE CNT
------ ----------
2 - 3 2
1 - 2 2
1 - 6 2
9 - 10 2
SQL>
So, in case of duplicates, the ROW_NUMBER query would give incorrect results. You should use DENSE_RANK.
SQL Fiddle
Oracle 11g R2 Schema Setup:
create table x(
num_ number,
status_ varchar2(1)
);
insert into x values(1,'A');
insert into x values(2,'A');
insert into x values(3,'A');
insert into x values(4,'U');
insert into x values(5,'U');
insert into x values(6,'A');
insert into x values(7,'U');
insert into x values(8,'U');
insert into x values(9,'A');
insert into x values(10,'A');
Query 1:
select min(num_) || '-' || max(num_) range_, status_,
count(1) count_
from
(
select num_, status_,
num_ - row_number() over (order by status_, num_) y --gives a group number to each groups, which have same status over consecutive records.
from x
)
where status_ = 'A'
group by y, status_
order by range_
Results:
| RANGE_ | STATUS_ | COUNT_ |
|--------|---------|--------|
| 1-3 | A | 3 |
| 6-6 | A | 1 |
| 9-10 | A | 2 |

SQL Server 2008 Cumulative Sum that resets value

I want to have the last column cumulative based on ROW_ID that resets every time it starts again with '1'.
Initially my table doesn't have the ROW_ID, this was created using partition so at least I can segregate my records.
It should add the Amt + CumulativeSum (except for the first record) all the way down and reset every time the Row_ID = 1.
I have tried several queries but it doesn't give me the desired result. I am trying to read answers from several forums but to no avail.
Can someone advise the best approach to do this?
For the sake of representation, I made the sample table as straightforward as possible.
ID ROW-ID Amt RunningTotal(Amt)
1 1 2 2
2 2 4 6
3 3 6 12
4 1 2 2
5 2 4 6
6 3 6 12
7 4 8 20
8 5 10 30
9 1 2 2
10 2 4 6
11 3 6 12
12 4 8 20
try this
declare #tb table(ID int, [ROW-ID] int, Amt money)
insert into #tb(ID, [ROW-ID], Amt) values
(1,1,2),
(2,2,4),
(3,3,6),
(4,1,2),
(5,2,4),
(7,4,8),
(8,5,10),
(9,1,2),
(10,2,4),
(11,3,6),
(12,4,8)
select *,sum(amt) over(partition by ([id]-[row-id]) order by id,[row-id]) AS cum from #tb
other version
select *,(select sum(amt) from #tb t where
(t.id-t.[row-id])=(t1.id-t1.[ROW-ID]) and (t.id<=t1.id) ) as cum
from #tb t1 order by t1.id,t1.[row-id]
Try this
SELECT distinct (T1.ID),
T1.ROW_ID,
T1.Amt,
CumulativeSum =
CASE
WHEN T1.RoW_ID=1 THEN T1.Amt
ELSE T1.Amt+ T2.Amt
END
FROM TestSum T1, TestSum T2
WHERE T1.ID = T2.ID+1
http://sqlfiddle.com/#!6/8b2a2/2
The idea is to create partitions from R column. First leave 1 if R = 1, else put 0. Then cumulative sum on that column. When you have partitions you can finally calculate cumulative sums on S column in those partitions:
--- --- ---
| 1 | | 1 | | 1 |
| 2 | | 0 | | 1 | --prev 1 + 0
| 3 | | 0 | | 1 | --prev 1 + 0
| 1 | | 1 | | 2 | --prev 1 + 1
| 2 | => | 0 | => | 2 | --prev 2 + 0
| 3 | | 0 | | 2 | --prev 2 + 0
| 4 | | 0 | | 2 | --prev 2 + 0
| 5 | | 0 | | 2 | --prev 2 + 0
| 1 | | 1 | | 3 | --prev 2 + 1
| 2 | | 0 | | 3 | --prev 3 + 0
--- --- ---
DECLARE #t TABLE ( ID INT, R INT, S INT )
INSERT INTO #t
VALUES ( 1, 1, 2 ),
( 2, 2, 4 ),
( 3, 3, 6 ),
( 4, 1, 2 ),
( 5, 2, 4 ),
( 6, 3, 6 ),
( 7, 4, 8 ),
( 8, 5, 10 ),
( 9, 1, 2 ),
( 10, 2, 4 ),
( 11, 3, 6 ),
( 12, 4, 8 );
For MSSQL 2008:
WITH cte1
AS ( SELECT ID ,
CASE WHEN R = 1 THEN 1
ELSE 0
END AS R ,
S
FROM #t
),
cte2
AS ( SELECT ID ,
( SELECT SUM(R)
FROM cte1 ci
WHERE ci.ID <= co.ID
) AS R ,
S
FROM cte1 co
)
SELECT * ,
( SELECT SUM(S)
FROM cte2 ci
WHERE ci.R = co.R
AND ci.ID <= co.ID
)
FROM cte2 co
For MSSQL 2012:
WITH cte
AS ( SELECT ID ,
SUM(CASE WHEN R = 1 THEN 1
ELSE 0
END) OVER ( ORDER BY ID ) AS R ,
S
FROM #t
)
SELECT * ,
SUM(s) OVER ( PARTITION BY R ORDER BY ID ) AS T
FROM cte
Output:
ID R S T
1 1 2 2
2 1 4 6
3 1 6 12
4 2 2 2
5 2 4 6
6 2 6 12
7 2 8 20
8 2 10 30
9 3 2 2
10 3 4 6
11 3 6 12
12 3 8 20
EDIT:
One more way. This looks way better by execution plan then first example:
SELECT * ,
CASE WHEN R = 1 THEN S
ELSE ( SELECT SUM(S)
FROM #t it
WHERE it.ID <= ot.ID
AND it.ID >= ( SELECT MAX(ID)
FROM #t iit
WHERE iit.ID < ot.ID
AND iit.R = 1
)
)
END
FROM #t ot

How to Retrieve Rank Based on Total Mark in SQLite Table

say I have a table like this:
S_id |ca1 |ca2 |exam
1 | 08 | 12 | 35
1 | 02 | 14 | 32
1 | 08 | 12 | 20
2 | 03 | 11 | 55
2 | 09 | 18 | 45
2 | 10 | 12 | 35
3 | 07 | 12 | 35
3 | 04 | 14 | 37
3 | 09 | 15 | 32
4 | 03 | 11 | 55
4 | 09 | 18 | 45
4 | 10 | 12 | 35
5 | 10 | 12 | 35
5 | 07 | 12 | 35
5 | 09 | 18 | 45
I want to select S_id, total and assign a rank on each student based on sum(ca1+ca2+exam) like the following:
S_id |total|rank
1 | 158 | 5
2 | 198 | 1
3 | 165 | 4
4 | 198 | 1
5 | 183 | 3
If there are the same total, like S_id 2 and S_id 4 with rank 1, I want the rank to be jumped to 3.
Thanks for helping.
Make the table:
sqlite> create table t (S_id, ca1, ca2, exam);
sqlite> insert into t values
...> ( 1 , 08 , 12 , 35 ),
...> ( 1 , 02 , 14 , 32 ),
...> ( 1 , 08 , 12 , 20 ),
...> ( 2 , 03 , 11 , 55 ),
...> ( 2 , 09 , 18 , 45 ),
...> ( 2 , 10 , 12 , 35 ),
...> ( 3 , 07 , 12 , 35 ),
...> ( 3 , 04 , 14 , 37 ),
...> ( 3 , 09 , 15 , 32 ),
...> ( 4 , 03 , 11 , 55 ),
...> ( 4 , 09 , 18 , 45 ),
...> ( 4 , 10 , 12 , 35 ),
...> ( 5 , 10 , 12 , 35 ),
...> ( 5 , 07 , 12 , 35 ),
...> ( 5 , 09 , 18 , 45 );
Make a temporary table with the total scores:
sqlite> create temp table tt
as select S_id, sum(ca1) + sum(ca2) + sum(exam) as total
from t group by S_id;
Use the temporary table to compute the ranks:
sqlite> select s.S_id, s.total,
(select count(*)+1 from tt as r where r.total > s.total) as rank
from tt as s;
1|143|5
2|198|1
3|165|4
4|198|1
5|183|3
Drop the temporary table:
sqlite> drop table tt;
ADDENDUM
With a recent change (2015-02-09) to SQLite, this formulation now works:
with tt (S_id, total) as
(select S_id, sum(ca1 + ca2 + exam) as total from t group by S_id)
select s.S_id, s.total,
(select count(*)+1 from tt as r where r.total > s.total) as rank
from tt as s;
Something like this maybe:
with tt(S_id,total) as (
select S_id, sum(ca1) + sum(ca2) + sum(exam)
from t
group by S_id
union
select null, 0
)
select s.S_id,
s.total,
(select count(*)+1
from tt as r
where r.total > s.total) as rank
from tt as s
where S_id is not null;
Per my standard Rank Rows answer, use a self join:
with tt (S_id, total) as
(select S_id, sum(ca1 + ca2 + exam) as total
from t group by S_id)
select S.S_id, S.total, 1+count(lesser.total) as RANK
from tt as S
left join tt as lesser
on S.total < lesser.total
group by S.S_id, S.total
order by S.total desc;
S_id total RANK
---------- ---------- ----------
2 198 1
4 198 1
5 183 3
3 165 4
1 143 5
You don't need a CTE; you could use a subquery instead, but you'd have to repeat it.
Using a SELECT clause in a SELECT clause to produce a column (as suggested elsewhere) is AFAIK nonstandard. A self-join is standard and should be easier for the query planner to optimize (if only for that reason). Also, the above query doesn't munge the data: it doesn't add a row to the CTE only to remove it in the main query.
I prefer the sum(ca1 + ca2 + exam) construction to adding the sums. That's how the question was posed, and it asks the system to do less work (only one summation). Sure, addition is commutative, but I wouldn't depend on the query optimizer to notice.