The array of numbers should be transferred to ranges to compress the number of records.
This is an example:
Current:
L2PARAM_ID
A_AXIS_RANGE
CL_L2_Params_2688303
10
CL_L2_Params_2688303
20
CL_L2_Params_2688303
70
CL_L2_Params_2688303
80
CL_L2_Params_2688303
90
CL_L2_Params_2688303
100
CL_L2_Params_2688303
110
CL_L2_Params_2688303
160
CL_L2_Params_2688303
170
CL_L2_Params_2688303
180
Needed:
L2PARAM_ID
A_AXIS_RANGE_FROM
A_AXIS_RANGE_TO
A_AXIS_RANGE_STEP
CL_L2_Params_2688303
10
20
10
CL_L2_Params_2688303
70
110
10
CL_L2_Params_2688303
160
180
10
Seems quite simple with the windows functions, but so far hard to get the 3 'groups'. Any help would be appreciated
tried so far:
with axis_by_step as (
select
L2PARAM_ID,
A_AXIS_RANGE,
LEAD (A_AXIS_RANGE) OVER (PARTITION BY L2PARAM_ID ORDER BY A_AXIS_RANGE) - A_AXIS_RANGE AS STEP_3,
COUNT (A_AXIS_RANGE) OVER (PARTITION BY L2PARAM_ID) cnt,
CASE WHEN
COUNT (A_AXIS_RANGE) OVER (PARTITION BY L2PARAM_ID) = 1
THEN A_AXIS_RANGE
ELSE LEAD (A_AXIS_RANGE) OVER (PARTITION BY L2PARAM_ID ORDER BY A_AXIS_RANGE)
END AS TO_3
from PMDM_LRSV_CL_L2_PARAM_PREP_MULTI_AXIS
)
select *
from axis_by_step
In this specific gaps-and-islands problem you need to:
look for when the axis_range between consecutive rows changes by more than 10, given this is your prefixed step, and get to flag with 1 those rows
compute a running sum over your flag value, in order to generate a new partitioning inside the "L2PARAM_ID" partition
aggregate exploiting your just generated partition, alongside with "L2PARAM_ID"
WITH cte1 AS (
SELECT *,
CASE WHEN A_AXIS_RANGE
- LAG(A_AXIS_RANGE) OVER(PARTITION BY L2PARAM_ID ORDER BY A_AXIS_RANGE) > 10
THEN 1 ELSE 0
END AS change_part
FROM tab
), cte2 AS (
SELECT *,
SUM(change_part) OVER(PARTITION BY L2PARAM_ID ORDER BY A_AXIS_RANGE) AS part
FROM cte1
)
SELECT L2PARAM_ID,
MIN(A_AXIS_RANGE) AS A_AXIS_RANGE_FROM,
MAX(A_AXIS_RANGE) AS A_AXIS_RANGE_TO,
10 AS AXIS_RANGE_STEP
FROM cte2
GROUP BY L2PARAM_ID, part
Check the demo here.
Related
I want to write a select query that selects distinct rows of data progressively.
Explaining with an example,
Say i have 5000 accounts selected for repayment of loan, these accounts are ordered in descending order( Account 1st has highest outstanding while account 5000nd will have the lowest).
I want to select 1000 unique accounts 5 times such that the total outstanding amount of repayment in all 5 cases are similar.
i have tried out a few methods by trying to select rownums based on odd/even or other such way, but it's only good for upto 2 distributions. I was expecting more like a A.P. as in maths that selects data progressively.
A naïve method of splitting sets into (for example) 5 bins, numbered 0 to 4, is give each row a unique sequential numeric index and then, in order of size, assign the first 10 rows to bins 0,1,2,3,4,4,3,2,1,0 and then repeat for additional sets of 10 rows:
WITH indexed_values (value, rn) AS (
SELECT value,
ROW_NUMBER() OVER (ORDER BY value DESC) - 1
FROM table_name
),
assign_bins (value, rn, bin) AS (
SELECT value,
rn,
CASE WHEN MOD(rn, 2 * 5) >= 5
THEN 5 - MOD(rn, 5) - 1
ELSE MOD(rn, 5)
END
FROM indexed_values
)
SELECT bin,
COUNT(*) AS num_values,
SUM(value) AS bin_size
FROM assign_bins
GROUP BY bin
Which, for some random data:
CREATE TABLE table_name ( value ) AS
SELECT FLOOR(DBMS_RANDOM.VALUE(1, 1000001)) FROM DUAL CONNECT BY LEVEL <= 1000;
May output:
BIN
NUM_VALUES
BIN_SIZE
0
200
100012502
1
200
100004633
2
200
99980342
3
200
99976774
4
200
100005756
It will not get the bins to have equal values but it is relatively simple and will get a close approximation if your values are approximately evenly distributed.
If you want to select values from a certain bin then:
WITH indexed_values (value, rn) AS (
SELECT value,
ROW_NUMBER() OVER (ORDER BY value DESC) - 1
FROM table_name
),
assign_bins (value, rn, bin) AS (
SELECT value,
rn,
CASE WHEN MOD(rn, 2 * 5) >= 5
THEN 5 - MOD(rn, 5) - 1
ELSE MOD(rn, 5)
END
FROM indexed_values
)
SELECT value
FROM assign_bins
WHERE bin = 0
fiddle
Table has values
10 20 30
40 50 60
70 80 90
we need to show data maximum of row-wise and column-wise.
Result should be
RowWiseMax, ColumnWiseMax
30, 70
60, 80
90, 90
Here's a solution to the puzzle as presented, I've had to make the assumption that the number of columns is not dynamic, this solution works using T-Sql, it's not clear what your database platform is but should hopefully port if different. Using a table named "T" with columns c1, c2, c3
select RowWiseMax,
case Row_Number() over (order by RowWiseMax)
when 1 then Max(c1) over()
when 2 then Max(c2) over()
when 3 then Max(c3) over()
end ColumnWiseMax
from T
cross apply (
select Max(v) RowWiseMax
from (values (c1), (c2), (c3))v(v)
)x
So I wrote a query to calculate the retention, new and returning student growth rate. The code below returns a result similar to this.
Row visit_month student_type numberofstd growth
1 2013 new 574 null
2 2014 new 220 -62%
3 2014 retained 442 245%
4 2015 new 199 -10%
5 2015 retained 533 21%
6 2016 new 214 8%
7 2016 retained 590 11%
8 2016 returning 1 -100%
Query I have tried.
with visit_log AS (
SELECT studentid,
cast(substr(session, 1, 4) as numeric) as visit_month,
FROM abaresult
GROUP BY 1,
2
ORDER BY 1,
2),
time_lapse_2 AS (
SELECT studentid,
Visit_month,
lag(visit_month, 1) over (partition BY studentid ORDER BY studentid, visit_month) lag
FROM visit_log),
time_diff_calculated_2 AS (
SELECT studentid,
visit_month,
lag,
visit_month - lag AS time_diff
FROM time_lapse_2),
student_categorized AS (
SELECT studentid,
visit_month,
CASE
WHEN time_diff=1 THEN 'retained'
WHEN time_diff>1 THEN 'returning'
WHEN time_diff IS NULL THEN 'new'
END AS student_type,
FROM time_diff_calculated_2)
SELECT visit_month,
student_type,
Count(distinct studentid) as numberofstd,
ROUND(100 * (COUNT(student_type) - LAG(COUNT(student_type), 1) OVER (ORDER BY student_type)) / LAG(COUNT(student_type), 1) OVER (ORDER BY student_type),0) || '%' AS growth
FROM student_categorized
group by 1,2
order by 1,2
The query above calculates the retention, new and returning rate based on the figures of the last session student_type category.
I am looking for a way to calculate these figures based on the total number of students in each visit_month and not from each category. Is there a way I can achieve this?
I am trying to get a table similar to this
Row visit_month student_type totalstd numberofstd growth
1 2013 new 574 574 null
2 2014 new 662 220 62%
3 2014 retained 662 442 22%
4 2015 new 732 199 10%
5 2015 retained 732 533 21%
6 2016 new 804 214 8%
7 2016 retained 804 590 11%
8 2016 returning 804 1 100%
Note:
The totalstd is the total number of student in each session and is gotten by new+retention+returning.
The growth calculation was assumed.
Please help!
Thank you.
While I do not have your source data, I am relying myself in the query you shared and the output results.
I created some extra code in order to output the desired result. I would like to point that I did not have access to BigQuery's compilation because I did not have the data. Thus, I have tried to prevent any possible errors the query myself. In addition, the queries between ** are unchanged and were copied from your code. Below is the code (it is a mix of yours and the extra bits I created):
#*****************************************************************
with visit_log AS (
SELECT studentid,
cast(substr(session, 1, 4) as numeric) as visit_month,
FROM abaresult
GROUP BY 1,
2
ORDER BY 1,
2),
time_lapse_2 AS (
SELECT studentid,
Visit_month,
lag(visit_month, 1) over (partition BY studentid ORDER BY studentid, visit_month) lag
FROM visit_log),
time_diff_calculated_2 AS (
SELECT studentid,
visit_month,
lag,
visit_month - lag AS time_diff
FROM time_lapse_2),
student_categorized AS (
SELECT studentid,
visit_month,
CASE
WHEN time_diff=1 THEN 'retained'
WHEN time_diff>1 THEN 'returning'
WHEN time_diff IS NULL THEN 'new'
END AS student_type,
FROM time_diff_calculated_2)
#**************************************************************
#Code I added
#each unique visit_month will have its count
WITH total_stud AS (
SELECT visit_month, count(distinct studentid) as totalstd FROM visit_log
GROUP BY 1
ORDER BY visit_month
),
#After you have your student_categorized temp table, create another one
#It will have the count of the number of students per visit_month per student_type
number_std_monthType AS (
SELECT visit_month,student_type, Count(distinct studentid) as numberofstd from student_categorized
GROUP BY 1, 2
),
#You will have one row per combination of visit_month and student_type
student_categorized2 AS(
SELECT DISTINCT visit_month,student_type FROM student_categorized2
GROUP BY 1,2
),
#before calculation, create the table with all the necessary data
#you have the desired table without the growth
#notice that I used two keys to join t1 and t3 so the results are correct
final_t AS (
SELECT t1.visit_month,
t1.student_type,
t2.totalstd as totalstd,
t3.numberofstd
FROM student_categorized2 t1
LEFT JOIN total_stud AS t2 ON t1.visit_month = t2.visit_month
LEFT JOIN number_std_monthType t3 ON (t1.visit_month = t3.visit_month and t1.student_type = t3.student_type)
ORDER BY
)
#Now all the necessary values to calculate growth are in the temp table final_t
SELECT visit_month, student_type, totalstd, numberofstd,
ROUND(100 * (totalstd - LAG(totalstd) OVER (PARTITION BY visit_month ORDER BY visit_month ASC) /LAG(totalstd) OVER (PARTITION BY visit_month ORDER BY visit_month ASC) || '%' AS growth
FROM final_t
Notice that I used LEFT JOIN in order to have the proper counts in the final table, once each count was calculated in a different temp table. Also, I did not use your final SELECT statement.
If you have any issues with the code, do not hesitate to ask.
Below is the data in my table. I need to compute the difference between MINMAXPOINTS from every next row for a specific position and if positive, record output as 1 and if negative, record output as 0 and then sum up all the output values into a locally declared variable.
Id Date Position MINMAXPOINTS
1 1/11/2016 ABOVE 82.4
2 1/5/2016 ABOVE 81.75
3 12/1/2015 ABOVE 72.79
4 10/28/2015 ABOVE 76.7
5 10/20/2015 ABOVE 80
6 1/15/2016 BELOW 68.4
7 1/7/2016 BELOW 72.29
8 12/14/2015 BELOW 61.25
9 11/10/2015 BELOW 60.89
10 10/27/2015 BELOW 73.29
11 9/4/2015 BELOW 54.35
The above data has been PARTITIONed by Position and then ORDERed by Date DESC.
ROW_NUMBER() OVER (PARTITION BY CTE_MinMax.Position ORDER BY [Date] DESC) AS MINMAXPOINTS
So the algorithm would be something like:
Compute sum of differences of MINMAXPOINTS for every next row like:
(1-2) + (2-3) + (3-4) + (4-5) for 'ABOVE' and
(6-7) + (7-8) + (8-9) + (9-10) + (10-11) for 'BELOW'.
Depending on the sign of each number from differences above, we get:
(1) + (1) + (-1) + (-1) for 'ABOVE' and (-1) + (1) + (1) + (-1) +
(1) for 'BELOW'
Next, we total all the values for 'ABOVE' and 'BELOW' and store 1
as the grand total in a locally declared variable.
In the above scenario the number of subtractions should be configurable. So, if i input 2, then only 2 pairs of Positions should be summed up like below:
(1-2) + (2-3) for 'ABOVE' and
(6-7) + (7-8) for 'BELOW'
Resulting in a grand total of 3.
It'd be totally awesome if this could be achieved just using - WITH CTEs or something without the use of cursors or without creation of any tables if possible.
Any help will be appreciated!.
WITH temp
AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY Position, Date) AS srn
FROM Table1
),
temp1
AS
(
SELECT t.*, ROW_NUMBER() OVER(PARTITION BY Position ORDER BY srn) AS srn1
FROM
(
SELECT t1.*, (t1.MINMAXPOINTS - t2.MINMAXPOINTS) AS Diff
FROM temp t1
LEFT JOIN temp t2
ON t1.srn = t2.srn - 1 AND t1.Position = t2.Position
)t
)
SELECT Position, SUM(Diff)
FROM temp1
WHERE
srn1 <= 2 -- 3, 4 any number upto which you need to calculate total
GROUP BY Position
So I am really happy being able to rank results based on effective dates, but currently I'm having an issue where one data element repeats (POD) while another changes based on EFFDT (DEPT).
I only want to rank unique values for Pod, and later Dept. However Pod is based on Dept, which changes more frequently. The below code gives me:
EENBR PodRank POD DeptRank DeptNbr DeptEffdt
100 1 73 1 12420 4/11/2005
100 2 73 2 12560 5/22/2005
100 3 73 3 12501 6/24/2007
200 1 12 1 50768 3/14/2005
200 2 13 2 10949 9/9/2012
300 1 73 1 12450 3/21/2005
300 2 73 2 12471 12/25/2005
300 3 73 3 12581 12/21/2008
300 4 73 4 12585 6/6/2010
300 5 73 5 12432 5/19/2013
SELECT DISTINCT
AL4.FULL_NAME,
AL4.EMPLOYEE_NUMBER,
dense_rank() over (partition by AL4.EMPLOYEE_NUMBER
order by AL3.EFFECTIVE_START_DATE) as POD_RANKING,
AL7.POD_NBR as POD,
row_number() over (partition by AL4.EMPLOYEE_NUMBER
order by AL3.EFFECTIVE_START_DATE) as DEPT_RANKING,
AL3.RECORDVALUE AS DEPT_NUMBER,
AL3.EFFECTIVE_START_DATE AS "DEPT EFFECTIVE DATE"
FROM T1 AL3,
T2 AL4,
T3 AL7
WHERE AL4.PERSON_ID = AL3.PERSON_ID
AND AL4.EMPLOYEE_NUMBER = AL3.EMPLOYEE_NUMBER
AND AL3.RECORDTYPE = 'DEPARTMENT_NUMBER'
AND AL7.DEPT_NBR = AL3.RECORDVALUE
Order By AL4.Employee_Number;
Is there a function that only ranks unique values?
The function you are looking for is the analytic function dense_rank():
dense_rank() over (partition by eenbr order by pod) as ranking
This is the simplest way to get what you want. You can just add it in the select clause of your query.
There's no function for this, but you can get the result when you use nested window functions:
SELECT dt.*,
SUM(flag) OVER (PARTITION BY EMPLOYEE_NUMBER
ORDER BY "DEPT EFFECTIVE DATE") AS POD_RANKING
FROM
(
SELECT
AL4.FULL_NAME,
AL4.EMPLOYEE_NUMBER,
AL7.POD_NBR AS POD,
ROW_NUMBER() OVER (PARTITION BY AL4.EMPLOYEE_NUMBER
ORDER BY AL3.EFFECTIVE_START_DATE) AS DEPT_RANKING,
AL3.RECORDVALUE AS DEPT_NUMBER,
AL3.EFFECTIVE_START_DATE AS "DEPT EFFECTIVE DATE",
CASE WHEN ROW_NUMBER()
OVER (PARTITION BY AL4.EMPLOYEE_NUMBER,AL7.POD_NBR
ORDER BY AL3.EFFECTIVE_START_DATE) = 1 THEN 1 ELSE 0 END AS flag
FROM T1 AL3,
T2 AL4,
T3 AL7
WHERE AL4.PERSON_ID = AL3.PERSON_ID
AND AL4.EMPLOYEE_NUMBER = AL3.EMPLOYEE_NUMBER
AND AL3.RECORDTYPE = 'DEPARTMENT_NUMBER'
AND AL7.DEPT_NBR = AL3.RECORDVALUE
) dt
ORDER BY AL4.Employee_Number;
Edit:
Ok, I noticed this is a overly complex version of a simple DENSE_RANK with different order, shortly before Gordon posted his answer :-)
dense_rank() over (partition by AL4.EMPLOYEE_NUMBER order by AL7.POD_NBR)