Is there a way to group this data? - sql

Data Looks like -
1
2
3
1
2
2
2
3
1
5
4
1
2
So whenever there is a 1, it marks the beginning of a group which includes all the elements until it hits the next 1. So here,
1 2 3 - group 1
1 2 2 2 3 - group 2
and so on..
What would be the SQL query to show the average for every such group.
I could not figure out how to group them without using for loops or PLSQL code.
Result should look like two columns, one with the actual data and col 2 with the average value-
1 - avg value of 1,2 3
2
3
1 - avg value of 1,2,2,2,3
2
2
2
3
1 - avg value of 1,5,4
5
4
1 - avg value of 1,2
2

SQL tables represent unordered sets. There is no ordering, unless a column specifies the ordering. Let me assume that you have such a column.
You can identify the groups using a cumulative sum:
select t.*,
sum(case when t.col = 1 then 1 else 0 end) over (order by ?) as grp
from t;
? is the column that specifies the ordering.
You can then calculate the average using aggregation:
select grp, avg(col)
from (select t.*,
sum(case when t.col = 1 then 1 else 0 end) over (order by ?) as grp
from t
) t
group by grp;

Related

Finding adjacent column values from the last non-null value of a certain column in Snowflake (SQL) using partition by

Say I have the following table:
ID
T
R
1
2
1
3
Y
1
4
1
5
1
6
Y
1
7
I would like to add a column which equals the value from column T based on the last non-null value from column R. This means the following:
ID
T
R
GOAL
1
2
1
3
Y
1
4
Y
3
1
5
4
1
6
Y
4
1
7
6
I do have many ID's so I need to make use of the OVER (PARTITION BY ...) clause. Also, if possible, I would like to use a single statement, like
SELECT *
, GOAL
FROM TABLE
So without any extra select statement.
T is in ascending order so just null it out according to R and take the maximum looking backward.
select *,
max(case when R is not null then T end)
over (
partition by id
order by T
rows between unbounded preceding and 1 preceding
) as GOAL
from TBL
http://sqlfiddle.com/#!18/c927a5/5

Resetting a Count in SQL

I have data that looks like this:
ID num_of_days
1 0
2 0
2 8
2 9
2 10
2 15
3 10
3 20
I want to add another column that increments in value only if the num_of_days column is divisible by 5 or the ID number increases so my end result would look like this:
ID num_of_days row_num
1 0 1
2 0 2
2 8 2
2 9 2
2 10 3
2 15 4
3 10 5
3 20 6
Any suggestions?
Edit #1:
num_of_days represents the number of days since the customer last saw a doctor between 1 visit and the next.
A customer can see a doctor 1 time or they can see a doctor multiple times.
If it's the first time visiting, the num_of_days = 0.
SQL tables represent unordered sets. Based on your question, I'll assume that the combination of id/num_of_days provides the ordering.
You can use a cumulative sum . . . with lag():
select t.*,
sum(case when prev_id = id and num_of_days % 5 <> 0
then 0 else 1
end) over (order by id, num_of_days)
from (select t.*,
lag(id) over (order by id, num_of_days) as prev_id
from t
) t;
Here is a db<>fiddle.
If you have a different ordering column, then just use that in the order by clauses.

sql - select single ID for each group with the lowest value

Consider the following table:
ID GroupId Rank
1 1 1
2 1 2
3 1 1
4 2 10
5 2 1
6 3 1
7 4 5
I need an sql (for MS-SQL) select query selecting a single Id for each group with the lowest rank. Each group needs to only return a single ID, even if there are two with the same rank (as 1 and 2 do in the above table). I've tried to select the min value, but the requirement that only one be returned, and the value to be returned is the ID column, is throwing me.
Does anyone know how to do this?
Use row_number():
select t.*
from (select t.*,
row_number() over (partition by groupid order by rank) as seqnum
from t
) t
where seqnum = 1;

oracle - no partition in window function but fill sequential numbers for acd properties

I have acd properties table with 3 columns - id, acd and rpt. The rpt is set to 1 when it is first reported for the acd property, but if any consequent acd properties are repeating, it is set to 0. The id column is always incrementing (sort of pk). Now for the continous zeros, I need the sequential numbers starting from 2,3... as shown in the wanted column.
id acd rpt wanted
1 a 1 1
2 b 1 1
3 b 0 2
4 a 1 1
5 a 0 2
6 a 0 3
7 d 1 1
8 d 0 2
9 d 0 3
10 c 1 1
11 c 0 2
12 c 0 3
13 c 0 4
14 c 0 5
15 d 1 1
16 a 1 1
I tried the window function, but when I use "value" column in partition clause it is grouping all a's which is not desired. Is it possible to get the results as in "wanted" column given rpt and id incrementing.
When rpt = 1, then you want 1. Then you want the 0s enumerated for each acd. If this is correct, then the logic is:
select t.*,
(case when rpt = 1 then 1
else 1 + row_number() over (partition by acd, rpt order by id)
end) as wanted
from t;
You need nested OLAP-funtions:
SELECT dt.*,
Row_Number() Over (PARTITION BY grp ORDER BY id)
FROM
( -- calculate a group number using a Cumulative Sum over 0/1 (for partitioning in next step)
SELECT prop.*, Sum(rpt) Over (ORDER BY id ROWS Unbounded Preceding) AS grp
FROM prop
) dt

How to find the SQL medians for a grouping

I am working with SQL Server 2008
If I have a Table as such:
Code Value
-----------------------
4 240
4 299
4 210
2 NULL
2 3
6 30
6 80
6 10
4 240
2 30
How can I find the median AND group by the Code column please?
To get a resultset like this:
Code Median
-----------------------
4 240
2 16.5
6 30
I really like this solution for median, but unfortunately it doesn't include Group By:
https://stackoverflow.com/a/2026609/106227
The solution using rank works nicely when you have an odd number of members in each group, i.e. the median exists within the sample, where you have an even number of members the rank method will fall down, e.g.
1
2
3
4
The median here is 2.5 (i.e. half the group is smaller, and half the group is larger) but the rank method will return 3. To get around this you essentially need to take the top value from the bottom half of the group, and the bottom value of the top half of the group, and take an average of the two values.
WITH CTE AS
( SELECT Code,
Value,
[half1] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value),
[half2] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value DESC)
FROM T
WHERE Value IS NOT NULL
)
SELECT Code,
(MAX(CASE WHEN Half1 = 1 THEN Value END) +
MIN(CASE WHEN Half2 = 1 THEN Value END)) / 2.0
FROM CTE
GROUP BY Code;
Example on SQL Fiddle
In SQL Server 2012 you can use PERCENTILE_CONT
SELECT DISTINCT
Code,
Median = PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Value) OVER(PARTITION BY Code)
FROM T;
Example on SQL Fiddle
SQL Server does not have a function to calculate medians, but you could use the ROW_NUMBER function like this:
WITH RankedTable AS (
SELECT Code, Value,
ROW_NUMBER() OVER (PARTITION BY Code ORDER BY VALUE) AS Rnk,
COUNT(*) OVER (PARTITION BY Code) AS Cnt
FROM MyTable
)
SELECT Code, Value
FROM RankedTable
WHERE Rnk = Cnt / 2 + 1
To elaborate a bit on this solution, consider the output of the RankedTable CTE:
Code Value Rnk Cnt
---------------------------
4 240 2 3 -- Median
4 299 3 3
4 210 1 3
2 NULL 1 2
2 3 2 2 -- Median
6 30 2 3 -- Median
6 80 3 3
6 10 1 3
Now from this result set, if you only return those rows where Rnk equals Cnt / 2 + 1 (integer division), you get only the rows with the median value for each group.