Oracle analytic function - using FIRST_VALUE to remove unwanted rows - sql

I believe the Oracle function FIRST_VALUE is what I need to be using based on these two questions:
SQL - How to select a row having a column with max value
Oracle: Taking the record with the max date
I have 3 tables that represent people associated with organizations. Each organization may have a parent org, where ORG.PARENT is a foreign key to ORG.ID (so the table refers to itself). A person may be associated with more than one group.
PERSON
ID NAME
----------
1 Bob
ORG
ID NAME PARENT
------------------------
1 A (null)
2 A-1 1
3 A-2 1
4 A-3 1
5 A-1-a 2
6 A-1-b 2
7 A-2-a 3
8 A-2-b 3
PERSON_TO_ORG
PERSON_ID ORG_ID
-----------------
1 1
1 3
I want to list the groups a person is associated with so I used this query:
SELECT NAME, ID, sys_connect_by_path(NAME, '/') AS path
FROM org
START WITH ID IN
(SELECT org_id FROM person_to_org WHERE person_id=1)
connect by prior org.ID = org.parent;
...which gives me:
NAME ID PATH
------------------
A-2 3 /A-2
A-2-a 8 /A-2/A-2-a
A-2-b 9 /A-2/A-2-b
A 1 /A
A-1 2 /A/A-1
A-1-a 5 /A/A-1/A-1-a
A-1-b 6 /A/A-1/A-1-b
A-2 3 /A/A-2
A-2-a 8 /A/A-2/A-2-a
A-2-b 9 /A/A-2/A-2-b
A-3 4 /A/A-3
Notice how A-2 appears twice, as it should. I don't want a group to appear twice, however. I want a group to only appear at its lowest level in the tree, i.e. at its highest level value. Here is how I've tried using FIRST_VALUE with no luck - I still get A-2 (and others) appearing twice:
SELECT id, name, path, first_value(lev) OVER
(
PARTITION BY ID,NAME, path ORDER BY lev DESC
) AS max_lev FROM
(SELECT NAME, ID, sys_connect_by_path(NAME, '/') AS path, LEVEL as lev
FROM org START WITH ID IN
(SELECT org_id FROM person_to_org WHERE person_id=1)
connect by prior org.ID = org.parent);
This seems similar to the FIRST_VALUE example in Pro Oracle SQL but I can't seem to make it work no matter how I tweak the parameters.
How can I return only the rows where a given group has its highest level value (i.e. farthest down in the tree)?

As also said in one of the threads you refer to, analytics are not the most efficient way to go here: you need to aggregate to filter out the duplicates.
SQL> SELECT id
2 , max(name) keep (dense_rank last order by lev) name
3 , max(path) keep (dense_rank last order by lev) path
4 FROM ( SELECT NAME
5 , ID
6 , sys_connect_by_path(NAME, '/') AS path
7 , LEVEL as lev
8 FROM org
9 START WITH ID IN (SELECT org_id FROM person_to_org WHERE person_id=1)
10 connect by prior org.ID = org.parent
11 )
12 group by id
13 /
ID NAME PATH
---------- ----- --------------------
1 A /A
2 A-1 /A/A-1
3 A-2 /A/A-2
4 A-3 /A/A-3
5 A-1-a /A/A-1/A-1-a
6 A-1-b /A/A-1/A-1-b
7 A-2-a /A/A-2/A-2-a
8 A-2-b /A/A-2/A-2-b
8 rows selected.
Regards,
Rob.
PS: Here is some more information about the LAST aggregate function: http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions071.htm#sthref1495

What about this (untested)
SELECT
SELECT id,
name,
path
FROM (
SELECT id,
name,
path,
row_number() over (partition by id,name order by lev desc) as rn
FROM (
SELECT NAME,
ID,
sys_connect_by_path(NAME, '/') AS path,
LEVEL as lev
FROM org
START WITH ID IN (SELECT org_id FROM person_to_org WHERE person_id=1)
connect by prior org.ID = org.parent
)
)
where rn = 1

You should partition only OVER (PARTITION BY ID,NAME ORDER BY lev DESC)
not ID,NAME, path
Edit:
And probably you want first_value(path), not first_value(lev)

Related

sql - select single ID for each group with the lowest value

Consider the following table:
ID GroupId Rank
1 1 1
2 1 2
3 1 1
4 2 10
5 2 1
6 3 1
7 4 5
I need an sql (for MS-SQL) select query selecting a single Id for each group with the lowest rank. Each group needs to only return a single ID, even if there are two with the same rank (as 1 and 2 do in the above table). I've tried to select the min value, but the requirement that only one be returned, and the value to be returned is the ID column, is throwing me.
Does anyone know how to do this?
Use row_number():
select t.*
from (select t.*,
row_number() over (partition by groupid order by rank) as seqnum
from t
) t
where seqnum = 1;

Calculate "position in run" in SQL

I have a table of consecutive ids (integers, 1 ... n), and values (integers), like this:
Input Table:
id value
-- -----
1 1
2 1
3 2
4 3
5 1
6 1
7 1
Going down the table i.e. in order of increasing id, I want to count how many times in a row the same value has been seen consecutively, i.e. the position in a run:
Output Table:
id value position in run
-- ----- ---------------
1 1 1
2 1 2
3 2 1
4 3 1
5 1 1
6 1 2
7 1 3
Any ideas? I've searched for a combination of windowing functions including lead and lag, but can't come up with it. Note that the same value can appear in the value column as part of different runs, so partitioning by value may not help solve this. I'm on Hive 1.2.
One way is to use a difference of row numbers approach to classify consecutive same values into one group. Then a row number function to get the desired positions in each group.
Query to assign groups (Running this will help you understand how the groups are assigned.)
select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
Final Query using row_number to get positions in each group assigned with the above query.
select id,value,row_number() over(partition by value,rnum_diff order by id) as pos_in_grp
from (select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
) t

postgresql - filter out double rows (but not the first and last one)

i got an "postgres" SQL problem.
I got a table which looks like this
id name level timestamp
1 pete 1 100
2 pete 1 200
3 pete 1 500
4 pete 5 900
7 pete 5 1000
9 pete 5 1200
15 pete 2 700
Now I want to delete the lines i dont need. i only want to now the first line where he get a new level and the last line he has this level.
id name level timestamp
1 pete 1 100
3 pete 1 500
15 pete 2 700
4 pete 5 900
9 pete 5 1200
(there much more columns like realmpoints and so on)
I have a solution if the the timestamp is only increasing.
SELECT id, name, level, timestamp
FROM player_testing
WHERE id IN ( SELECT MAX(dup.id)
FROM player_testing As dup
GROUP BY dup.name, dup.level)
UNION
SELECT MIN(dup.id)
FROM player_testing As dup
GROUP BY dup.name, dup.level)
)
ORDER BY ts
But I find no way to makes it work for my problem.
select id, name, level, timestamp
from (
select id,name,level,timestamp,
row_number() over (partition by name, level order by timestamp) as rn,
count(*) over (partition by name, level) as max_rn
from player_testing
) t
where rn = 1 or rn = max_rn;
Btw: timestamp is a horrible name for a column. For one reason because it's a reserved word, but more importantly because it doesn't document what the column contains. Is that a start_timestamp and end_timestamp a valid_until_timestamp, ...?
Here is an alternate solution to #a_horse_with_no_name's without over partition, and thus more generic SQL:
select *
from player_testing as A
where id = (
select min(id)
from player_testing as B
where A.name = B.name
and A.level = B.level
)
or id = (
select max(id)
from player_testing as B
where A.name = B.name
and A.level = B.level
)
Here is the fiddle to show it working: http://sqlfiddle.com/#!2/47bd44/1

Get data from self-referencing table in all directions

I have a table with such rows:
ID Parent_ID Name
1 (null) A
2 1 B
3 1 C
4 2 D
5 3 E
6 5 F
7 (null) G
8 (null) H
I need to get IDs of all related rows no matter if Name='A' or 'F' is passed as criteria. In this case I should receive all ID beside 7 and 8.
I tried lot of examples and read a lot of articles but I give up now. Can you help with it?
with
t as (
select id
from your_table
where name = 'D' -- your starting point
)
select id
from (
select id, parent_id from your_table
where parent_id is not null
union all
select parent_id, id from your_table
where parent_id is not null
union all
select id, null from t
)
start with parent_id is null
connect by nocycle prior id = parent_id
fiddle
A is at the root of a hierarchy (it's the parent of B, which is the parent of D, etc.). To start with A and work down to F (and also down to D and E, which also have A as a parent through different routes):
SELECT ID, Parent_ID, Name
FROM tbl
START WITH Name = 'A'
CONNECT BY PRIOR ID = Parent_ID
F is at the end of a hierarchy. Oracle calls this a "leaf". To start with leaf F and work up to A at the top:
SELECT ID, Parent_ID, Name
FROM tbl
START WITH Name = 'F' -- start with F instead of A
CONNECT BY PRIOR Parent_ID = ID -- switch the CONNECT BY to work up
Oracle has a SYS_CONNECT_BY_PATH function that's great for visualizing the hierarchy. Here's how to use it in the first query (A down to F):
SELECT ID, Parent_ID, Name, SYS_CONNECT_BY_PATH(Name, '/') AS Path
FROM tbl
START WITH Name = 'A'
CONNECT BY PRIOR ID = Parent_ID
Results:
ID PARENT_ID NAME PATH
---- ---------- ---- -----------
1 A /A
2 1 B /A/B
4 2 D /A/B/D
3 1 C /A/C
5 3 E /A/C/E
6 5 F /A/C/E/F
You can use any delimeter you want as the second argument to SYS_CONNECT_BY_PATH.

Applying a sort order to existing data using SQL 2008R2

I have some existing data that I need to apply a "SortOrder" to based upon a few factors:
The ordering starts at "1" for any given Owner
The ordering is applied alphabetically (basically following an ORDER BY Name) to increase the sort order.
Should two items have the same name (as I've illustrated in my data set), we can apply the lower sort order value to the item with the lower id.
Here is some sample data to help illustrate what I'm talking about:
What I have:
Id OwnerId Name SortOrder
------ ------- ---------------------- ---------
1 1 A Name NULL
2 1 C Name NULL
3 1 B Name NULL
4 2 Z Name NULL
5 2 Z Name NULL
6 2 A Name NULL
What I need:
Id OwnerId Name SortOrder
------ ------- ---------------------- ---------
1 1 A Name 1
3 1 B Name 2
2 1 C Name 3
6 2 A Name 1
4 2 Z Name 2
5 2 Z Name 3
This could either be done in the form of an UPDATE statement or doing an INSERT INTO (...) SELECT FROM (...) if it's easier to move the data from one table to the next.
Easy - use a CTE (Common Table Expression) and the ROW_NUMBER() ranking function:
;WITH OrderedData AS
(
SELECT Id, OwnerId, Name,
ROW_NUMBER() OVER(PARTITION BY OwnerId ORDER BY Name, Id) AS 'SortOrder'
FROM
dbo.YourTable
)
SELECT *
FROM OrderedData
ORDER BY OwnerId, SortOrder
The PARTITION BY clause groups your data into group for each value of OwnerId and the ROW_NUMBER() then starts counting at 1 for each new group of data.
Update: If you want to update your table to set the SortOrder column - try this:
;WITH OrderedData AS
(
SELECT
Id, OwnerId, Name,
ROW_NUMBER() OVER(PARTITION BY OwnerId ORDER BY Name, Id) AS 'RowNum'
FROM
dbo.YourTable
)
UPDATE OrderedData
SET SortOrder = RowNum
That should set the SortOrder column to the values that the ROW_NUMBER() function returns