Oracle SQL -- Analytic functions OVER a group? - sql

My table:
ID NUM VAL
1 1 Hello
1 2 Goodbye
2 2 Hey
2 4 What's up?
3 5 See you
If I want to return the max number for each ID, it's really nice and clean:
SELECT MAX(NUM) FROM table GROUP BY (ID)
But what if I want to grab the value associated with the max of each number for each ID?
Why can't I do:
SELECT MAX(NUM) OVER (ORDER BY NUM) FROM table GROUP BY (ID)
Why is that an error? I'd like to have this select grouped by ID, rather than partitioning separately for each window...
EDIT: The error is "not a GROUP BY expression".

You could probably use the MAX() KEEP(DENSE_RANK LAST...) function:
with sample_data as (
select 1 id, 1 num, 'Hello' val from dual union all
select 1 id, 2 num, 'Goodbye' val from dual union all
select 2 id, 2 num, 'Hey' val from dual union all
select 2 id, 4 num, 'What''s up?' val from dual union all
select 3 id, 5 num, 'See you' val from dual)
select id, max(num), max(val) keep (dense_rank last order by num)
from sample_data
group by id;

When you use windowing function, you don't need to use GROUP BY anymore, this would suffice:
select id,
max(num) over(partition by id)
from x
Actually you can get the result without using windowing function:
select *
from x
where (id,num) in
(
select id, max(num)
from x
group by id
)
Output:
ID NUM VAL
1 2 Goodbye
2 4 What's up
3 5 SEE YOU
http://www.sqlfiddle.com/#!4/a9a07/7
If you want to use windowing function, you might do this:
select id, val,
case when num = max(num) over(partition by id) then
1
else
0
end as to_select
from x
where to_select = 1
Or this:
select id, val
from x
where num = max(num) over(partition by id)
But since it's not allowed to do those, you have to do this:
with list as
(
select id, val,
case when num = max(num) over(partition by id) then
1
else
0
end as to_select
from x
)
select *
from list
where to_select = 1
http://www.sqlfiddle.com/#!4/a9a07/19

If you're looking to get the rows which contain the values from MAX(num) GROUP BY id, this tends to be a common pattern...
WITH
sequenced_data
AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY id ORDER BY num DESC) AS sequence_id,
*
FROM
yourTable
)
SELECT
*
FROM
sequenced_data
WHERE
sequence_id = 1
EDIT
I don't know if TeraData will allow this, but the logic seems to make sense...
SELECT
*
FROM
yourTable
WHERE
num = MAX(num) OVER (PARTITION BY id)
Or maybe...
SELECT
*
FROM
(
SELECT
*,
MAX(num) OVER (PARTITION BY id) AS max_num_by_id
FROM
yourTable
)
AS sub_query
WHERE
num = max_num_by_id
This is slightly different from my previous answer; if multiple records are tied with the same MAX(num), this will return all of them, the other answer will only ever return one.
EDIT
In your proposed SQL the error relates to the fact that the OVER() clause contains a field not in your GROUP BY. It's like trying to do this...
SELECT id, num FROM yourTable GROUP BY id
num is invalid, because there can be multiple values in that field for each row returned (with the rows returned being defined by GROUP BY id).
In the same way, you can't put num inside the OVER() clause.
SELECT
id,
MAX(num), <-- Valid as it is an aggregate
MAX(num) <-- still valid
OVER(PARTITION BY id), <-- Also valid, as id is in the GROUP BY
MAX(num) <-- still valid
OVER(PARTITION BY num) <-- Not valid, as num is not in the GROUP BY
FROM
yourTable
GROUP BY
id
See this question for when you can't specify something in the OVER() clause, and an answer showing when (I think) you can: over-partition-by-question

Related

sql - single line per distinct values in a given column

is there a way using sql, in bigquery more specifically, to get one line per unique value in a given column
I know that this is possible using a sequence of union queries where you have as much union as distinct values as there is in the column of interest. but i'm wondering if there is a better way to do it.
You can use row_number():
select t.* except (seqnum)
from (select t.*, row_number() over (partition by col order by col) as seqnum
from t
) t
where seqnum = 1;
This returns an arbitrary row. You can control which row by adjusting the order by.
Another fun solution in BigQuery uses structs:
select array_agg(t limit 1)[ordinal(1)].*
from t
group by col;
You can add an order by (order by X limit 1) if you want a particular row.
here is just a more formated format :
select tab.* except(seqnum)
from (
select *, row_number() over (partition by column_x order by column_x) as seqnum
from `project.dataset.table`
) as tab
where seqnum = 1
Below is for BigQuery Standard SQL
#standardSQL
SELECT AS VALUE ANY_VALUE(t)
FROM `project.dataset.table` t
GROUP BY col
You can test, play with above using dummy data as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 1 col UNION ALL
SELECT 2, 1 UNION ALL
SELECT 3, 1 UNION ALL
SELECT 4, 2 UNION ALL
SELECT 5, 2 UNION ALL
SELECT 6, 3
)
SELECT AS VALUE ANY_VALUE(t)
FROM `project.dataset.table` t
GROUP BY col
with result
Row id col
1 1 1
2 4 2
3 6 3

SQL Server Row_number in ORDER BY CASE

EDITED THE WHOLE TOPIC.
I need to create a view that sort article per type.
If I only have the type : *VALUE -> I need to show this line only.
If I have the type : *VALUE & 2 -> Still showing row accordingly to *VALUE type only.
If I only have the type : 2 -> Showing this one.
I already did somethink like this :
VALUE* is a value that should come from an another table with a Join.
SELECT Id_item ,Name_item , Type_item , Id_type_item FROM ITEM
WHERE Name_item = 'Gillette' AND (Id_Type_item = VALUE* OR Id_Type_item ='10')
ORDER BY CASE
WHEN row_number() OVER(ORDER BY Id_item DESC , Id_Type_Item DESC) <= 1 THEN 0
ELSE 1
END;
But it does that in the case where we've got both row for the types(*VALUE & 10):
Id_item / Name_item / Type_item / Id_Type_Item
1 Gillette 45 30 (*VALUE)
1 Gillette 2 10
So I think that the order by on the Over() could be useful to always sort by *VALUE (which are in reality another column from another table)
I always want to select 1 row of data only ! :)
I'm guessing, that what you want is the "first" row returned from each SELECT? There's no need to use a separate SELECT statement for each variable on the same table, you can use a window function to do so. I believe this is what you might be after.
WITH CTE AS(
--The following assumes table A and B have the same DDL (which begs the question, why are they different tables?)
SELECT *,
ROW_NUMBER() OVER (PARTITION BY var
ORDER BY (SELECT NULL)) AS RN --Replace SELECT(NULL) with your actual ordering criteria
FROM A
WHERE var IN (1,2)
UNION --ALL(?)
SELECT *
ROW_NUMBER() OVER (PARTITION BY var
ORDER BY (SELECT NULL)) AS RN --Replace SELECT(NULL) with your actual ordering criteria
FROM B
WHERE var IN (3))
SELECT *
FROM CTE
WHERE RN = 1;
Here is a possible solution. In this case ROW_NUMBER, RANK and DENSE_RANK would all work. However, ROW_COUNT is not a valid window function in sql server.
DECLARE #A TABLE(ID INT, Value INT)
DECLARE #B TABLE(ID INT,Value INT)
INSERT INTO #A VALUES (1,1),(2,1),(3,2),(4,3),(5,2),(6,1),(7,3)
INSERT INTO #B VALUES (1,1),(2,1),(3,1),(4,2),(5,3),(6,2),(7,1),(8,3)
;WITH D AS
(
SELECT ID,Value FROM #A WHERE Value IN(1,2)
UNION ALL
SELECT ID,Value FROM #B WHERE Value IN (3)
)
SELECT * FROM
(
SELECT
ID, Value,
ValueRankInSet = DENSE_RANK() OVER(PARTITION BY VALUE ORDER BY ID) -- <-- If you do not have an ID field you can subst ID with NEWID() as order is not important
FROM D
)AS X
WHERE ValueRankInSet = 1
Assign the priority within your Select(s) and then order by it in the row_number:
with cte as
(
SELECT *,
row_number()
over (-- partition by ???
order by prio) as Position
FROM
(
SELECT 1 as prio, * FROM A WHERE var = 1
UNION -- probably a more efficient UNION ALL
SELECT 2 as prio, * FROM A WHERE var = 2
UNION -- probably a more efficient UNION ALL
SELECT 3 as prio, * FROM B WHERE var = 3
)
)
select *
from cte
WHERE Position = 1

get intervals of nonchanging value from a sequence of numbers

I need to sumarize a sequence of values into intervals of nonchanging values - begin, end and value for each such interval. I can easily do it in plsql but would like a pure sql solution for both performance and educational reasons. I have been trying for some time to solve it with analytical functions, but can't figure how to properly define windowing clause. The problem I am having is with a repeated value.
Simplified example -
given input:
id value
1 1
2 1
3 2
4 2
5 1
I'd like to get output
from to val
1 2 1
3 4 2
5 5 1
You want to identify groups of adjacent values. One method is to use lag() to find the beginning of the sequence, then a cumulative sum to identify the groups.
Another method is the difference of row number:
select value, min(id) as from_id, max(id) as to_id
from (select t.*,
(row_number() over (order by id) -
row_number() over (partition by val order by id
) as grp
from table t
) t
group by grp, value;
Using a CTE to collect all the rows and identifying them into changing values, then finally grouping together for the changing values.
CREATE TABLE #temp (
ID INT NOT NULL IDENTITY(1,1),
[Value] INT NOT NULL
)
GO
INSERT INTO #temp ([Value])
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 2 UNION ALL
SELECT 1;
WITH Marked AS (
SELECT
*,
grp = ROW_NUMBER() OVER (ORDER BY ID)
- ROW_NUMBER() OVER (PARTITION BY Value ORDER BY ID)
FROM #temp
)
SELECT MIN(ID) AS [From], MAX(ID) AS [To], [VALUE]
FROM Marked
GROUP BY grp, Value
ORDER BY MIN(ID)
DROP TABLE #temp;

Teradata SQL: Max (greatest), 2nd and 3rd greatest column names

I have a table with 6 columns in Teradata as follows:
ID Feature1 Feature2 Feature3 Feature4 Feature5
1 12 15 1 22 350
2 121 0.9 999 756 879
...
I need to get the column names for the greatest, 2nd greatest and 3rd greatest values per row, so, I need output that looks like this:
ID Greatest 2nd_Greatest 3rd_Greatest
1 Feature5 Feature4 Feature2
2 Feature3 Feature5 Feature4
Can someone help please.
Thank you!
You can do this with a massive case statement, which gets even more complicated if any of the values are NULL. That would be the fastest way, though.
The easiest method might be to unpivot the data and re-summarize it:
select id,
max(case when seqnum = 1 then feature end) as greatest_feature,
max(case when seqnum = 2 then feature end) as greatest_feature2,
max(case when seqnum = 3 then feature end) as greatest_feature3,
max(case when seqnum = 1 then which end) as which_1,
max(case when seqnum = 2 then which end) as which_2,
max(case when seqnum = 3 then which end) as which_3
from (select id, feature, row_number() over (partition by id order by feature desc) as serqnum
from ((select id, feature1 as feature, 'feature1' as which from table) union all
(select id, feature2 as feature, 'feature2' as which from table) union all
(select id, feature3 as feature, 'feature3' as which from table) union all
(select id, feature4 as feature, 'feature4' as which from table) union all
(select id, feature5 as feature, 'feature5' as which from table) union all
(select id, feature6 as feature, 'feature6' as which from table)
) t
) t
group by id;
Refining Gordon's query:
Instead of several passes over the source table for those UNIONs you can create a list of features and then cross join it:
SELECT t.id, f.feature,
CASE f.feature
WHEN 'feature1' THEN t.feature1
WHEN 'feature2' THEN t.feature2
WHEN 'feature3' THEN t.feature3
WHEN 'feature4' THEN t.feature4
WHEN 'feature5' THEN t.feature5
END AS val
FROM tab AS t CROSS JOIN
(
SELECT * FROM (SELECT 'feature1' AS feature) AS dt
UNION ALL
SELECT * FROM (SELECT 'feature2' AS feature) AS dt
UNION ALL
SELECT * FROM (SELECT 'feature3' AS feature) AS dt
UNION ALL
SELECT * FROM (SELECT 'feature4' AS feature) AS dt
UNION ALL
SELECT * FROM (SELECT 'feature5' AS feature) AS dt
) AS f
You can create the list on the fly like above using UNIONs or as a real table.
Starting with TD14.10 there's also a TD_UNPIVOT table operator (but still no PIVOT):
SELECT *
FROM TD_UNPIVOT
(
ON (SELECT id, feature1, feature2, feature3, feature4, feature5 FROM tab)
USING
VALUE_COLUMNS('val')
UNPIVOT_COLUMN('feature')
COLUMN_LIST('feature1', 'feature2', 'feature3', 'feature4', 'feature5')
) AS dt
Also starting with TD14.10 there's LAST_VALUE which can be used for finding nth-greatest value together with the ROW_NUMBER, thus avoiding the final aggregation:
SELECT id,
feature AS "Greatest",
LAST_VALUE(feature)
OVER (PARTITION BY id ORDER BY val DESC
ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS "2nd_Greatest",
LAST_VALUE(feature)
OVER (PARTITION BY id ORDER BY val DESC
ROWS BETWEEN 2 FOLLOWING AND 2 FOLLOWING) AS "3rd_Greatest"
FROM TD_UNPIVOT
(
ON (SELECT id, feature1, feature2, feature3, feature4, feature5 FROM tab)
USING
VALUE_COLUMNS('val')
UNPIVOT_COLUMN('feature')
COLUMN_LIST('feature1', 'feature2', 'feature3', 'feature4', 'feature5')
) AS dt
QUALIFY ROW_NUMBER() OVER (PARTITION BY id ORDER BY val DESC) = 1;

How do I get records before and after given one?

I have the following table structure:
Id, Message
1, John Doe
2, Jane Smith
3, Error
4, Jane Smith
Is there a way to get the error record and the surrounding records? i.e. find all Errors and the record before and after them.
;WITH numberedlogtable AS
(
SELECT Id,Message,
ROW_NUMBER() OVER (ORDER BY ID) AS RN
FROM logtable
)
SELECT Id,Message
FROM numberedlogtable
WHERE RN IN (SELECT RN+i
FROM numberedlogtable
CROSS JOIN (SELECT -1 AS i UNION ALL SELECT 0 UNION ALL SELECT 1) n
WHERE Message='Error')
WITH err AS
(
SELECT TOP 1 *
FROM log
WHERE message = 'Error'
ORDER BY
id
),
p AS
(
SELECT TOP 1 l.*
FROM log
WHERE id <
(
SELECT id
FROM err
)
ORDER BY
id DESC
)
SELECT TOP 3 *
FROM log
WHERE id >
(
SELECT id
FROM p
)
ORDER BY
id
Adapt this routine to pick out your target.
DECLARE #TargetId int
SET #TargetId = 3
select *
from LogTable
where Id in (-- "before"
select max(Id)
from LogTable
where Id < #TargetId
-- target
union all select #TargetId
-- "after"
union all select min(Id)
from LogTable
where Id > #TargetId)
select id,messag from
(Select (Row_Number() over (order by ID)) as RNO, * from #Temp) as A,
(select SubRNO-1 as A,
SubRNO as B,
SubRNO+1 as C
from (Select (Row_Number() over (order by ID)) as SubRNO, * from #Temp) as C
where messag = 'Error') as B
where A.RNO = B.A or A.RNO = B.B or A.RNO = B.C
;WITH Logs AS
(
SELECT ROW_NUMBER() OVER (ORDER BY id), id, message as rownum FROM LogTable lt
)
SELECT curr.id, prev.id, next.id
FROM Logs curr
LEFT OUTER JOIN Logs prev ON curr.rownum+1=prev.rownum
RIGHT OUTER JOIN Logs next ON curr.rownum-1=next.rownum
WHERE curr.message = 'Error'
select id, message from tbl where id in (
select id from tbl where message = "error"
union
select id-1 from tbl where message = "error"
union
select id+1 from tbl where message = "error"
)
Get fixed number of rows before & after target
Using UNION for a simple, high performance query (I found selected answer WITH query above to be extremely slow)
Here is a high performance alternative to the WITH top selected answer, when you know an ID or specific identifier for a given record, and you want to select a fixed number of records BEFORE and AFTER that record. Requires a number field for ID, or something like date that can be sorted ascending / descending.
Example: You want to select the 10 records before and after a specific error was recorded, you know the error ID, and can sort by date or ID.
The following query gets (inclusive) the 1 result above, the identified record itself, and the 1 record below. After the UNION, the results are sorted again in descending order.
SELECT q.*
FROM(
SELECT TOP 2
id, content
FROM
the_table
WHERE
id >= [ID]
ORDER BY id ASC
UNION
SELECT TOP 1
id, content
FROM
the_table
WHERE
id < [ID]
ORDER BY id DESC
) q
ORDER BY q.id DESC