How would I use a window function or similar, to number each group or partition of rows, based on certain shared characteristics?
For example:
I have a list of names ordered alphabetically that I wish to group and identify using IDs that describe the group that they belong to and position within each group.
-------------------------------------------
| outer_id | inner_id | src_id | name |
|----------|----------|--------|----------|
| 1 | 1 | 88129 | albert |
| 1 | 2 | 88130 | albrecht |
| 1 | 3 | 88131 | allan |
| 2 | 1 | 88132 | barnaby |
| 2 | 2 | 88133 | barry |
| 2 | 3 | 88134 | bart |
-------------------------------------------
I can achieve inner_id, src_id and name using a query similar to the following:
WITH cte (src_id, name) AS (
VALUES
(88129, 'albert'),
(88130, 'albrecht'),
(88131, 'allan'),
(88132, 'barnaby'),
(88133, 'barry'),
(88134, 'bart')
)
SELECT row_number() OVER (partition by left(name, 1) ORDER BY name DESC) AS inner_id, src_id, name
FROM cte;
How would I go about adding an outer_id column as shown, to represent each window (or group)?
You can use dense_rank():
select dense_rank() over (order by left(name, 1)) as outer_id,
row_number() over (partition by left(name, 1) order by name desc) as inner_id,
src_id, name
from cte;
Related
I have a table that I want to join some other tables on to. The table called "OfficePers" has a field for the Office Location ID as well as a field for the IDs of people who work in that location and another field with their names. For example, the table is of the following format:
| OfficeLocation | PersonID |
|-------------- | -------- |
|321 | 2323 |
|321 | 2355 |
|321 | 1234 |
|321 | 7899 |
|321 | 32091 |
|321 | 777 |
|1654 | 4232 |
|121243 | 345 |
|121243 | 343 |
|121243 | 111 |
What I want to do is create a subquery that returns one result per office location and creates aliases for each personID and name - so the above table would be transformed into something like the following:
| OfficeLocation | PersonID_1 | PersonID_2 | PersonID_3 | PersonID_4| PersonID_5| PersonID_6|
| -------------- | ----------- |----------- |----------- |-----------|-----------|-----------|
| 321 | 2323 | 2355 | 1234 | 7899 | 32091 | 777 |
| 1654 | 4232 | | | | | |
| 121243 | 345 | 343 | 111 | | | |
I was thinking of perhaps doing something like just joining the "OfficePers" table on itself multiple times but I'm not sure what function I could use to parse out each Person ID - I'm familiar with using Max and Min but that wouldn't work with a case of having more than 2 Person IDs at the same location.
The simplest method is to combine these into a single column using listagg():
select OfficeLocation,
listagg(personid, ',') within group (order by personid) as personids
from t
group by OfficeLocation;
I do not recommend putting the values in separate columns for several reasons. First, you don't know how many columns you will need. Second, you can do this using dynamic SQL, but you cannot create a view for the result. Third, the same person could -- in theory -- appear in multiple rows, but the person would likely be in different columns.
EDIT:
If you want exactly six columns, you can use conditional aggregation:
select OfficeLocation,
max(case when seqnum = 1 then PersonId) as PersonId_1,
max(case when seqnum = 2 then PersonId) as PersonId_2,
max(case when seqnum = 3 then PersonId) as PersonId_3,
max(case when seqnum = 4 then PersonId) as PersonId_4,
max(case when seqnum = 5 then PersonId) as PersonId_5,
max(case when seqnum = 6 then PersonId) as PersonId_6
from (select t.*,
row_number() over (partition by OfficeLocation order by personid) as seqnum
from t
) t
group by OfficeLocation;
I have a table that has a number column and an attribute column like this:
1.
+-----+-----+
| num | att |
-------------
| 1 | a |
| 1 | b |
| 1 | a |
| 2 | a |
| 2 | b |
| 2 | b |
+------------
I want to make the number unique, and the attribute to be whichever attribute occured most often for that number, like this (This is the end-product im interrested in) :
2.
+-----+-----+
| num | att |
-------------
| 1 | a |
| 2 | b |
+------------
I have been working on this for a while and managed to write myself a query that looks up how many times an attribute occurs for a given number like this:
3.
+-----+-----+-----+
| num | att |count|
------------------+
| 1 | a | 1 |
| 1 | b | 2 |
| 2 | a | 1 |
| 2 | b | 2 |
+-----------------+
But I can't think of a way to only select those rows from the above table where the count is the highest (for each number of course).
So basically what I am asking is given table 3, how do I select only the rows with the highest count for each number (Of course an answer describing providing a way to get from table 1 to table 2 directly also works as an answer :) )
You can use aggregation and window functions:
select num, att
from (
select num, att, row_number() over(partition by num order by count(*) desc, att) rn
from mytable
group by num, att
) t
where rn = 1
For each num, this brings the most frequent att; if there are ties, the smaller att is retained.
Oracle has an aggregation function that does this, stats_mode().:
select num, stats_mode(att)
from t
group by num;
In statistics, the most common value is called the mode -- hence the name of the function.
Here is a db<>fiddle.
You can use group by and count as below
select id, col, count(col) as count
from
df_b_sql
group by id, col
Being a beginner at SQL, I'm stuck.
I have a table structure like thi:
+------+-------+-----------------------------------------+
| id | name | content |
+------+-------+-----------------------------------------+
| 1 | Jack | ... |
| 2 | Dan | ... |
| 1 | Joe | ... |
| 1 | Jeoffery | ... |
+------+-------+-----------------------------------------+
What I want to do is that I want to select the Distinct IDs along with the name with max length against that specific id.
For e.g: Against ID 1, it should return Jeoffery while against ID 2, Dan.
Any help would be much appreciated.
You can use ROW_NUMBER():
;WITH CTE AS
(
SELECT id,
name,
RN = ROW_NUMBER() OVER(PARTITION BY id ORDER BY LEN(name) DESC)
)
SELECT id,
name
FROM CTE
WHERE RN = 1;
I am using SQL Server and have a table set up like below:
| id | subject | content | moreContent | modified |
| 1 | subj1 | aaaa | aaaaaaaaaaa | 03/03/2015 |
| 2 | subj1 | bbbb | aaaaaaaaaaa | 03/05/2015 |
| 3 | subj2 | cccc | aaaaaaaaaaa | 03/03/2015 |
| 4 | subj1 | dddd | aaaaaaaaaaa | 03/01/2015 |
| 5 | subj2 | eeee | aaaaaaaaaaa | 07/02/2015 |
I want to select the latest record for each subject heading, so the records to be returned would be:
| id | subject | content | moreContent | modified |
| 2 | subj1 | bbbb | aaaaaaaaaaa | 03/05/2015 |
| 3 | subj2 | cccc | aaaaaaaaaaa | 03/03/2015 |
SELECT Subject, MAX(Modified) FROM [CareManagement].[dbo].[Careplans] GROUP BY Subject
I could do a query like the one above, but I want to preserve all of the content from the selected rows. To return the content columns I would need to apply an aggregate function, or add them to the group by clause which wouldn't give me the desired effect.
I have also looked at nested queries but not found a successful solution yet. If anyone could assist that would be great.
You can use ROW_NUMBER():
SELECT id, subject, content, moreContent, modified
FROM (
SELECT id, subject, content, moreContent, modified,
ROW_NUMBER() OVER (PARTITION BY subject
ORDER BY modified DESC) AS rn
FROM [CareManagement].[dbo].[Careplans] ) t
WHERE rn = 1
rn = 1 will return each record having the latest modified date per subject. In case there are two or more records sharing the same 'latest' date and you want all of these records returned, then you might have a look at RANK() window function.
Using ROW_NUMBER this becomes pretty simple.
with myCTE as
(
select id
, Subject
, content
, morecontent
, Modified
, ROW_NUMBER() over (PARTITION BY [Subject] order by Modified desc) as RowNum
from [CareManagement].[dbo].[Careplans]
)
select id
, Subject
, content
, morecontent
, Modified
from myCTE
where RowNum = 1
You could use the rank window function to retrieve only the latest record:
SELECT id, subject, content, moreContent, modified
FROM (SELECT id, subject, content, moreContent, modified,
RANK() OVER (PARTITION BY subject ORDER BY modified DESC) AS rk
FROM [CareManagement].[dbo].[Careplans]) t
WHERE rk = 1
I got those two tables sport and student:
First table sport:
|idsport | name |
_______________________
| 1 | bobsled |
| 2 | skating |
| 3 | boarding |
| 4 | iceskating |
| 5 | skiing |
Second table student:
foreign key
|idstudent | name | sport_idsport
__________________________________________
| 1 | john | 3 |
| 2 | pauly | 2 |
| 3 | max | 1 |
| 4 | jane | 2 |
| 5 | nico | 5 |
so far i did this it output which number is mostly inserted, but cant get it to work
with two tables
SELECT sport_idsport
FROM (SELECT sport_idsport FROM student GROUP BY sport_idsport ORDER BY COUNT(*) desc)
WHERE ROWNUM<=1;
I need to output name of most popular sport, in that case it would be skating.
I use oracle sql.
with counter as (
Select sport_idsport,
count(*) as cnt,
dense_rank() over (order by count(*) desc) as rn
from student
group by sport_idsport
)
select s.*, c.cnt
from sport s
join counter c on c.sport_idsport = s.idsport and c.rn = 1;
SQLFiddle example: http://sqlfiddle.com/#!4/b76e21/1
select cnt, sport_idsport from (
select count(*) cnt, sport_idsport
from student
group by sport_idsport
order by count(*) desc
)
where rownum = 1