Reset counter during output with row_number - sql

I have the following SQL Query part:
SELECT car,
grouper,
yearout,
rn
FROM (SELECT Row_number() OVER(partition BY grouper ORDER BY yearout) AS rn,
car,
grouper,
yearout
FROM (SELECT DISTINCT res.sn AS car,
res.groupin AS groupin,
Year(res.yearin) AS yearin
FROM (inner stuff)) res
WHERE Year(res.yearin) BETWEEN '2015' AND '2030'
ORDER BY res.groupin) temp) ALL
WHERE ALL.rn <= (SELECT taken
FROM (SELECT p1.nos - p2.gone AS taken,
yearout,
group_key
FROM (SELECT count(sn) AS nos,
group_key
FROM d_fm_cars
GROUP BY group_key) p1
JOIN (SELECT taken AS gone,
group_key_all,
yearout
FROM settings_keys) p2
ON p1.group_key = p2.group_key_all) counterin
WHERE counterin.yearout = ALL.yearout
AND counterin.group_key_all = ALL.grouper)
The values for taken are like
5
15
for grouper 1 for the years 2015 and 2016 the problem now is that the result i get from the query just runs 1-15 and not like it should from 1-5 and than again from 1-15
The available cars per year are
2015 3
2016 10
2017 25
so the output must run over the years limit and put 2 cars from 2015 into the 2015 output.
When partitioning with grouper and yearout that is not the case anymore. But leaving yearout out it just runs through and does not reset the counter at 5.
Any idea on how to fix this?
Thanks for any help you can provide.

Related

LAG function alternative. I need the results for the missing year in between

I have this table so far. However, I would like to obtain the results for 2019 which there are no records so it becomes 0. Are there any alternatives to the LAG funciton.
ID
Year
Year_Count
1
2018
10
1
2020
20
Whenever I use the LAG function in SQL it gives me the results for 2018. However, I would like to get 0 for 2019 and then 10 for 2018
LAG(YEAR_COUNT) OVER (PARTITION BY ID ORDER BY YEAR) AS previous_year_count
untested notepad scribble
CASE
WHEN 1 = YEAR - LAG(YEAR) OVER (PARTITION BY ID ORDER BY YEAR)
THEN LAG(YEAR_COUNT) OVER (PARTITION BY ID ORDER BY YEAR)
ELSE 0
END AS previous_year_count
I'll add on to Nick's comment here with an example.
The YEARS CTE here is creating that table of years as he suggested, the RECORDS table is matching the above posted. Then they get joined together with COALESCE to fill in the null values left by the LEFT JOIN (filled ID with 0, not sure what your case would be).
You would need to LEFT JOIN onto the YEAR table and select the YEAR variable from the YEAR table in the final query, otherwise you'd only end up with only 2018/2020 or those years and some null values
WITH
YEARS AS
(
SELECT 2016 AS YEAR UNION ALL
SELECT 2017 UNION ALL
SELECT 2018 UNION ALL
SELECT 2019 UNION ALL
SELECT 2020 UNION ALL
SELECT 2021 UNION ALL
SELECT 2022
)
,
RECORDS AS
(
SELECT 1 ID, 2018 YEAR, 10 YEAR_COUNT UNION ALL
SELECT 1, 2020, 20)
SELECT
COALESCE(ID, 0) AS ID,
Y.YEAR,
COALESCE(YEAR_COUNT, 0) AS YEAR_COUNT
FROM YEARS AS Y
LEFT JOIN RECORDS AS R
ON R.YEAR = Y.YEAR
Here is the dbfiddle so you can visualize - https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=9e777ad925b09eb8ba299d610a78b999
Vertica SQL is not an available test environment, so this may not work directly but should at least get you on the right track.
The LAG function would not work to get 2019 for a few reasons
It's a window function and can only grab from data that is available - the default for LAG in your case appears to be 1 aka LAG(YEAR_COUNT, 1)
Statements in the select typically can't add any rows data back into a table, you would need to add in data with JOINs
If 2019 does exist in a prior table and you're using group by to get year count, it's possible that you have a where clause excluding the data.

Calculating growth and retention rate with sql

So I wrote a query to calculate the retention, new and returning student growth rate. The code below returns a result similar to this.
Row visit_month student_type numberofstd growth
1 2013 new 574 null
2 2014 new 220 -62%
3 2014 retained 442 245%
4 2015 new 199 -10%
5 2015 retained 533 21%
6 2016 new 214 8%
7 2016 retained 590 11%
8 2016 returning 1 -100%
Query I have tried.
with visit_log AS (
SELECT studentid,
cast(substr(session, 1, 4) as numeric) as visit_month,
FROM abaresult
GROUP BY 1,
2
ORDER BY 1,
2),
time_lapse_2 AS (
SELECT studentid,
Visit_month,
lag(visit_month, 1) over (partition BY studentid ORDER BY studentid, visit_month) lag
FROM visit_log),
time_diff_calculated_2 AS (
SELECT studentid,
visit_month,
lag,
visit_month - lag AS time_diff
FROM time_lapse_2),
student_categorized AS (
SELECT studentid,
visit_month,
CASE
WHEN time_diff=1 THEN 'retained'
WHEN time_diff>1 THEN 'returning'
WHEN time_diff IS NULL THEN 'new'
END AS student_type,
FROM time_diff_calculated_2)
SELECT visit_month,
student_type,
Count(distinct studentid) as numberofstd,
ROUND(100 * (COUNT(student_type) - LAG(COUNT(student_type), 1) OVER (ORDER BY student_type)) / LAG(COUNT(student_type), 1) OVER (ORDER BY student_type),0) || '%' AS growth
FROM student_categorized
group by 1,2
order by 1,2
The query above calculates the retention, new and returning rate based on the figures of the last session student_type category.
I am looking for a way to calculate these figures based on the total number of students in each visit_month and not from each category. Is there a way I can achieve this?
I am trying to get a table similar to this
Row visit_month student_type totalstd numberofstd growth
1 2013 new 574 574 null
2 2014 new 662 220 62%
3 2014 retained 662 442 22%
4 2015 new 732 199 10%
5 2015 retained 732 533 21%
6 2016 new 804 214 8%
7 2016 retained 804 590 11%
8 2016 returning 804 1 100%
Note:
The totalstd is the total number of student in each session and is gotten by new+retention+returning.
The growth calculation was assumed.
Please help!
Thank you.
While I do not have your source data, I am relying myself in the query you shared and the output results.
I created some extra code in order to output the desired result. I would like to point that I did not have access to BigQuery's compilation because I did not have the data. Thus, I have tried to prevent any possible errors the query myself. In addition, the queries between ** are unchanged and were copied from your code. Below is the code (it is a mix of yours and the extra bits I created):
#*****************************************************************
with visit_log AS (
SELECT studentid,
cast(substr(session, 1, 4) as numeric) as visit_month,
FROM abaresult
GROUP BY 1,
2
ORDER BY 1,
2),
time_lapse_2 AS (
SELECT studentid,
Visit_month,
lag(visit_month, 1) over (partition BY studentid ORDER BY studentid, visit_month) lag
FROM visit_log),
time_diff_calculated_2 AS (
SELECT studentid,
visit_month,
lag,
visit_month - lag AS time_diff
FROM time_lapse_2),
student_categorized AS (
SELECT studentid,
visit_month,
CASE
WHEN time_diff=1 THEN 'retained'
WHEN time_diff>1 THEN 'returning'
WHEN time_diff IS NULL THEN 'new'
END AS student_type,
FROM time_diff_calculated_2)
#**************************************************************
#Code I added
#each unique visit_month will have its count
WITH total_stud AS (
SELECT visit_month, count(distinct studentid) as totalstd FROM visit_log
GROUP BY 1
ORDER BY visit_month
),
#After you have your student_categorized temp table, create another one
#It will have the count of the number of students per visit_month per student_type
number_std_monthType AS (
SELECT visit_month,student_type, Count(distinct studentid) as numberofstd from student_categorized
GROUP BY 1, 2
),
#You will have one row per combination of visit_month and student_type
student_categorized2 AS(
SELECT DISTINCT visit_month,student_type FROM student_categorized2
GROUP BY 1,2
),
#before calculation, create the table with all the necessary data
#you have the desired table without the growth
#notice that I used two keys to join t1 and t3 so the results are correct
final_t AS (
SELECT t1.visit_month,
t1.student_type,
t2.totalstd as totalstd,
t3.numberofstd
FROM student_categorized2 t1
LEFT JOIN total_stud AS t2 ON t1.visit_month = t2.visit_month
LEFT JOIN number_std_monthType t3 ON (t1.visit_month = t3.visit_month and t1.student_type = t3.student_type)
ORDER BY
)
#Now all the necessary values to calculate growth are in the temp table final_t
SELECT visit_month, student_type, totalstd, numberofstd,
ROUND(100 * (totalstd - LAG(totalstd) OVER (PARTITION BY visit_month ORDER BY visit_month ASC) /LAG(totalstd) OVER (PARTITION BY visit_month ORDER BY visit_month ASC) || '%' AS growth
FROM final_t
Notice that I used LEFT JOIN in order to have the proper counts in the final table, once each count was calculated in a different temp table. Also, I did not use your final SELECT statement.
If you have any issues with the code, do not hesitate to ask.

Oracle SQL Group by and sum with multiple conditions

I attached a capture of two tables:
- the left table is a result of others "Select" query
- the right table is the result I want from the left table
The right table can be created following the next conditions:
When the same Unit have all positive or all negative
energy values, the result remain the same
When the same Unit have positive and negative energy values then:
Make a sum of all Energy for that Unit(-50+15+20 = -15) and then take the maximum of absolut value for the Energy.e.g. max(abs(energy))=50 and take the price for that value.
I use SQL ORACLE.
I realy appreciate the help in this matter !
http://sqlfiddle.com/#!4/eb85a/12
This returns desired result:
signs CTE finds out whether there are positive/negative values, as well as maximum ABS energy value
then, there's union of two selects: one that returns "original" rows (if count of distinct signs is 1), and one that returns "calculated" values, as you described
SQL> with
2 signs as
3 (select unit,
4 count(distinct sign(energy)) cnt,
5 max(abs(energy)) max_abs_ene
6 from tab
7 group by unit
8 )
9 select t.unit, t.price, t.energy
10 from tab t join signs s on t.unit = s.unit
11 where s.cnt = 1
12 union all
13 select t.unit, t2.price, sum(t.energy)
14 from tab t join signs s on t.unit = s.unit
15 join tab t2 on t2.unit = s.unit and abs(t2.energy) = s.max_abs_ene
16 where s.cnt = 2
17 group by t.unit, t2.price
18 order by unit;
UNIT PRICE ENERGY
-------------------- ---------- ----------
A 20 -50
A 50 -80
B 13 -15
SQL>
Though, what do you expect if there was yet another "B" unit row with energy = +50? Then two rows would have the same MAX(ABS(ENERGY)) value.
A union all might be the simplest solution:
with t as (
select t.*,
max(energy) over (partition by unit) as max_energy,
min(energy) over (partition by unit) as min_energy
from t
)
select unit, price, energy
from t
where max_energy > 0 and min_energy > 0 or
max_energy < 0 and min_enery < 0
union all
select unit,
max(price) keep (dense_rank first order by abs(energy)),
sum(energy)
from t
where max_energy > 0 and min_energy < 0
group by unit;

SQL Server find the missing number

I have a table like below
id name year
--------------
1 A 2000
2 B 2000
2 B 2000
2 B 2000
5 C 2000
1 D 2001
3 E 2001
as well as you see in the year 2000 we missed id '3' and id '4' and in the year 2001 we missed id '2'. I want to generate my second table which includes missed items.
2nd table :
From-id to-id name year
--------------------------------
3 4 null 2000
2 null null 2001
Which method in a SQL query can solve my problem?
Gaps and Islands in Sequences is the name of this problem. you read this article
Here's something to get you started:
WITH cte AS
(
SELECT *
FROM
(VALUES
(1),(2),(3),(4),(5)
) Tally(number)
), cte2 as
(
SELECT DISTINCT [year]
FROM
(VALUES
(2000),(2000),(2001)
)tbl([year])
), cte3 as
(
SELECT *
FROM cte
CROSS JOIN cte2
)
SELECT *
FROM cte3
LEFT OUTER JOIN YourTable ON cte3.number = YourTable.id AND cte3.[year] = YourTable[year)
A few notes: please avoid using reserved keywords as column names (such as year).
Furthermore, since I didn't know how you'd handle multiple missing ranges I did not format the output to reflect a range. For example: What would be your expected output if only one row with id=3 would be in your table?
I'd probably use ROW_NUMBER for this
This query gives you what the correct ID should be (if I interpreted your question right):
SELECT
ROW_NUMBER() OVER (PARTITION BY yr ORDER BY name, yr) as "Correct ID", *
FROM misorder
It assigns a row number (so a number starting from 1 increasing by 1 every time the year is the same).
And to let you know which ones are missing I think this should be a working solution:
WITH missing AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY yr ORDER BY name, yr) as "Correct ID", *
FROM misorder
)
SELECT * FROM missing
WHERE "Correct ID" != "id"
It takes the first query as a base to select only those records where the assumed correct ID is not equal to the currently assigned ID. You can turn this into a query to include the ranges you mentioned, but not sure if that is really necessary.
Hope this helps.

GROUP values separated by specific records

I want to make a specific counter which will raise by one after a specific record is found in a row.
time event revenue counter
13.37 START 20 1
13.38 action A 10 1
13.40 action B 5 1
13.42 end 1
14.15 START 20 2
14.16 action B 5 2
14.18 end 2
15.10 START 20 3
15.12 end 3
I need to find out total revenue for every visit (actions between START and END). I was thinking the best way would be to set a counter like this:
so I could group events. But if you have a better solution, I would be grateful.
You can use a query similar to the following:
with StartTimes as
(
select time,
startRank = row_number() over (order by time)
from events
where event = 'START'
)
select e.*, counter = st.startRank
from events e
outer apply
(
select top 1 st.startRank
from StartTimes st
where e.time >= st.time
order by st.time desc
) st
SQL Fiddle with demo.
May need to be updated based on the particular characteristics of the actual data, things like duplicate times, missing events, etc. But it works for the sample data.
SQL Server 2012 supports an OVER clause for aggregates, so if you're up to date on version, this will give you the counter you want:
count(case when eventname='START' then 1 end) over (order by eventtime)
You could also use the latest START time instead of a counter to group by, like this:
with t as (
select
*,
max(case when eventname='START' then eventtime end)
over (order by eventtime) as timeStart
from YourTable
)
select
timeStart,
max(eventtime) as timeEnd,
sum(revenue) as totalRevenue
from t
group by timeStart;
Here's a SQL Fiddle demo using the schema Ian posted for his solution.