Effective way of locating top ranked rows on Oracle DB - sql

I have a large table (millions of records) and I need to write an efficient select statement.
The table looks like this:
create table tab1 (
pt_key number
, cp_key number
, ext_info varchar2(10)
, resp_nm varchar2(20)
, resp_dttm date
, rank number
);
Sample records:
insert into tab1 values (1,1,'info1','OK', to_date('01.03.18 17:00:00','DD.MM.RR HH24:MI'),1);
insert into tab1 values (1,1,'info2','FAILED', to_date('01.03.18 17:00:00','DD.MM.RR HH24:MI'),2);
insert into tab1 values (1,1,'info3','SENT', to_date('01.03.18 17:00:00','DD.MM.RR HH24:MI'),3);
insert into tab1 values (1,1,'info4','SENT', to_date('02.03.18 17:00:00','DD.MM.RR HH24:MI'),3);
insert into tab1 values (1,2,'info5','OK', to_date('05.03.18 17:00:00','DD.MM.RR HH24:MI'),1);
insert into tab1 values (1,2,'info6','OK', to_date('06.03.18 17:00:00','DD.MM.RR HH24:MI'),1);
insert into tab1 values (1,2,'info7','FAILED', to_date('01.03.18 17:00:00','DD.MM.RR HH24:MI'),2);
I would like the query to return for each combination of pt_key and cp_key (part of composite primary key, other columns are not indexed) record with the highest rank. If there are (for each combination of pt_key and cp_key) several records with the same highest rank then pick the one with the greatest resp_dttm.
The select statement should return only the first four columns.
For the above posted sample data the desired result would be:
1 1 info4 SENT
1 2 info7 FAILED
Thanks for help.

Here's one approach using row_number():
select *
from (
select *, row_number() over (partition by pt_key, cp_key
order by rank desc, resp_dttm desc) rn
from tab1
) t
where rn = 1

Here's another approach using FIRST aggregate function:
select pt_key,
cp_key,
max(ext_info) keep (dense_rank first order by t.rank desc, t.resp_dttm desc) as ext_info,
max(resp_nm) keep (dense_rank first order by t.rank desc, t.resp_dttm desc) as resp_nm
from tab1 t
group by pt_key, cp_key
Here's how it works on Oracle Live SQL
EDIT 2:
Result:
PT_KEY | CP_KEY | EXT_INFO | RESP_NM
--------+--------+----------+---------
1 | 1 | info4 | SENT
1 | 2 | info7 | FAILED
EDIT 1:
This solution has an important drawback, if for a certain combination of pt_key and cp_key, there are multiple rows with the same rank and resp_dttm values. In that case it will "combine" those rows, and calculate the aggregates for ext_info and resp_nm (in my example it'll take max value).
You can refine that behavior, by adding tertiary sort criteria, to make the ranking distinct (e.g. add all other columns from the primary key).
The answer from #sgeddes is a bit better in that sense, that it will use one (random) row from the equally ranked rows, without combining the data, and without having to add sorting criteria. It also is easier to maintain/update, as it has the ranking criteria in one place, while mine has it in two spots.
You should probably test performance of both in your specific scenario (e.g. specific indices, specific data profile/statistics).

Related

Add column to ensure composite key is unique

I have a table which needs to have a composite primary key based on 2 columns (Material number, Plant).
For example, this is how it is currently (note that these rows are not unique):
MATERIAL_NUMBER PLANT NUMBER
------------------ ----- ------
000000000000500672 G072 1
000000000000500672 G072 1
000000000000500672 G087 1
000000000000500672 G207 1
000000000000500672 G207 1
However, I'll need to add the additional column (NUMBER) to the composite key such that each row is unique, and it must work like this:
For each MATERIAL_NUMBER, for each PLANT, let NUMBER start at 1 and increment by 1 for each duplicate record.
This would be the desired output:
MATERIAL_NUMBER PLANT NUMBER
------------------ ----- ------
000000000000500672 G072 1
000000000000500672 G072 2
000000000000500672 G087 1
000000000000500672 G207 1
000000000000500672 G207 2
How would I go about achieving this, specifically in SQL Server?
Best Regards!
SOLVED.
See below:
SELECT MATERIAL_NUMBER, PLANT, (ROW_NUMBER() OVER (PARTITION BY MATERIAL_NUMBER, PLANT ORDER BY VALID_FROM)) as NUMBER
FROM Table_Name
Will output the table in question, with the NUMBER column properly defined
Suppose this is actual table,
create table #temp1(MATERIAL_NUMBER varchar(30),PLANT varchar(30), NUMBER int)
Suppose you want to insert only single record then,
declare #Num int
select #Num=isnull(max(number),0) from #temp1 where MATERIAL_NUMBER='000000000000500672' and PLANT='G072'
insert into #temp1 (MATERIAL_NUMBER,PLANT , NUMBER )
values ('000000000000500672','G072',#Num+1)
Suppose you want to insert bulk record.Your bulk record sample data is like
create table #temp11(MATERIAL_NUMBER varchar(30),PLANT varchar(30))
insert into #temp11 (MATERIAL_NUMBER,PLANT)values
('000000000000500672','G072')
,('000000000000500672','G072')
,('000000000000500672','G087')
,('000000000000500672','G207')
,('000000000000500672','G207')
You want to insert `#temp11` in `#temp1` maintaining number id
insert into #temp1 (MATERIAL_NUMBER,PLANT , NUMBER )
select t11.MATERIAL_NUMBER,t11.PLANT
,ROW_NUMBER()over(partition by t11.MATERIAL_NUMBER,t11.PLANT order by (select null))+isnull(maxnum,0) as Number from #temp11 t11
outer apply(select MATERIAL_NUMBER,PLANT,max(NUMBER)maxnum from #temp1 t where t.MATERIAL_NUMBER=t11.MATERIAL_NUMBER
and t.PLANT=t11.PLANT group by MATERIAL_NUMBER,PLANT) t
select * from #temp1
drop table #temp1
drop table #temp11
Main question is Why you need number column ? In mot of the cases you don't need number column,you can use ROW_NUMBER()over(partition by t11.MATERIAL_NUMBER,t11.PLANT order by (select null)) to display where you need. This will be more efficient.
Or tell the actual situation and number of rows involved where you will be needing Number column.

Rotate rows into columns with column names not coming from the row

I've looked at some answers but none of them seem to be applicable to me.
Basically I have this result set:
RowNo | Id | OrderNo |
1 101 1
2 101 10
I just want to convert this to
| Id | OrderNo_0 | OrderNo_1 |
101 1 10
I know I should probably use PIVOT. But the syntax is just not clear to me.
The order numbers are always two. To make things clearer
And if you want to use PIVOT then the following works with the data provided:
declare #Orders table (RowNo int, Id int, OrderNo int)
insert into #Orders (RowNo, Id, OrderNo)
select 1, 101, 1 union all select 2, 101, 10
select Id, [1] OrderNo_0, [2] OrderNo_1
from (
select RowNo, Id, OrderNo
from #Orders
) SourceTable
pivot (
sum(OrderNo)
for RowNo in ([1],[2])
) as PivotTable
Reference: https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-2017
Note: To build each row in the result set the pivot function is grouping by the columns not begin pivoted. Therefore you need an aggregate function on the column that is being pivoted. You won't notice it in this instance because you have unique rows to start with - but if you had multiple rows with the RowNo and Id you would then find the aggregation comes into play.
As you say there are only ever two order numbers per ID, you could join the results set to itself on the ID column. For the purposes of the example below, I'm assuming your results set is merely selecting from a single Orders table, but it should be easy enough to replace this with your existing query.
SELECT o1.ID, o1.OrderNo AS [OrderNo_0], o2.OrderNo AS [OrderNo_1]
FROM Orders AS o1
INNER JOIN Orders AS o2
ON (o1.ID = o2.ID AND o1.OrderNo <> o2.OrderNo)
From your sample data, simplest you can try to use min and MAX function.
SELECT Id,min(OrderNo) OrderNo_0,MAX(OrderNo) OrderNo_1
FROM T
GROUP BY Id

Get distinct individual column values (not distinct pairs) from two tables in single query

I have two tables like the following. One is for sport talents of some people and second for arts talents. One may not have a sport talent to list and same applies for art talent.
CREATE TABLE SPORT_TALENT(name varchar(10), TALENT varchar(10));
CREATE TABLE ART_TALENT(name varchar(10), TALENT varchar(10));
INSERT INTO SPORT_TALENT(name, TALENT) VALUES
('Steve', 'Footbal')
,('Steve', 'Golf')
,('Bob' , 'Golf')
,('Mary' , 'Tennnis');
INSERT INTO ART_TALENT(name, TALENT) VALUES
('Steve', 'Dancer')
, ('Steve', 'Singer')
, ('Bob' , 'Dancer')
, ('Bob' , 'Singer')
, ('John' , 'Dancer');
Now I want to list down sport talent and art talent of one person. I would like to avoid duplication. But I don't mind if there is a "null" in any output. I tried the following
select distinct sport_talent.talent as s_talent,art_talent.talent as a_talent
from sport_talent
JOIN art_talent on sport_talent.name=art_talent.name
where (sport_talent.name='Steve' or art_talent.name='Steve');
s_talent | a_talent
----------+----------
Footbal | Dancer
Golf | Singer
Footbal | Singer
Golf | Dancer
I would like to avoid redundancy and need something like the following (distinct values of sport talents + distinct values of art talents).
s_talent | a_talent
----------+----------
Footbal | Dancer
Golf | Singer
As mentioned in subject, I am not looking for distinct combinations. But at the same time, it's OK if there are some records with "null" value in one column. I am relatively new to SQL.
Try:
SELECT s_talent, a_talent
FROM (
SELECT distinct on (talent) talent as s_talent,
dense_rank() over (order by talent) as x
FROM SPORT_TALENT
WHERE name='Steve'
) x
FULL OUTER JOIN (
SELECT distinct on (talent) talent as a_talent,
dense_rank() over (order by talent) as x
FROM ART_TALENT
WHERE name='Steve'
) y
ON x.x = y.x
Demo: http://sqlfiddle.com/#!15/66e04/3
There are no duplicates in your query. Each of the four records in your query return is unique. This result may not be what you want, but seems like its problem is not the duplicate.
Postgres 9.4
... introduces unnest() with multiple arguments. Does exactly what you want, and should be fast, too. Per documentation:
The special table function UNNEST may be called with any number of
array parameters, and it returns a corresponding number of columns, as
if UNNEST (Section 9.18) had been called on each parameter separately
and combined using the ROWS FROM construct.
About ROWS FROM:
Compare result of two table functions using one column from each
SELECT *
FROM unnest(
ARRAY(SELECT DISTINCT talent FROM sport_talent WHERE name = 'Steve')
, ARRAY(SELECT DISTINCT talent FROM art_talent WHERE name = 'Steve')
) AS t(s_talent, a_talent);
Postgres 9.3 or older
SELECT s_talent, a_talent
FROM (
SELECT talent AS s_talent, row_number() OVER () AS rn
FROM sport_talent
WHERE name = 'Steve'
GROUP BY 1
) s
FULL JOIN (
SELECT talent AS a_talent, row_number() OVER () AS rn
FROM art_talent
WHERE name = 'Steve'
GROUP BY 1
) a USING (rn);
Similar previous answers with more explanation:
What type of JOIN to use
Sort columns independently, such that all nulls are last per column
This is similar to what #kordirko posted, but uses GROUP BY to get distinct talents, which is evaluated before window functions. So we only need a bare row_number() and not the more expensive dense_rank().
About the sequence of events in a SELECT query:
Best way to get result count before LIMIT was applied
SQL Fiddle.

A thought experiment in SQL

I want to show the number of times each distinct element in a column in a table in a SQL database appears, alongside the particular distinct element in a new output table. Is it possible in a single statement over ramming my head over it manually?
Without having actually tried, how about this:
SELECT tmp.Field, (SELECT COUNT(*) FROM [Table] t WHERE t.DesiredField = tmp.Field) AS Count
FROM
(
SELECT DISTINCT DesiredField FROM [Table]
) tmp
This would first select all distinct values from [Table] and in the outer select, take the values and the number of times they appear in the column.
You could also try
SELECT Field, SUM(1) AS Count FROM Table
GROUP BY Field
This should "flatten" the table so that it only contains distinct values in Field and the number of rows where Field has the same value.
I just tried the second - it seems to work nicely.
Turns out I was wrong all the time. The second example and the following actually return the same results:
SELECT Field, COUNT(*) AS Count FROM Table
GROUP BY Field
Simplest just to use COUNT(). You'll see varieties on what your count parameter, so here are the options.
DECLARE #tbl TABLE(id INT, data INT)
INSERT INTO #tbl VALUES (1,1),(2,1),(3,2),(4,NULL)
SELECT data
,COUNT(*) Count_star
,COUNT(id) Count_id
,COUNT(data) Count_data
,COUNT(1) Count_literal
FROM #tbl
GROUP BY data
data Count_star Count_id Count_data Count_literal
----------- ----------- ----------- ----------- -------------
NULL 1 1 0 1
1 2 2 2 2
2 1 1 1 1
Warning: Null value is eliminated by an aggregate or other SET operation.
You'll see the difference coming with the treatment of NULL if you COUNT a field that contains NULLs.

SQL DISTINCT, GROUP BY OR....?

I have a database with the following columns
SKU | designID | designColor | width | height | price | etc.
SKU number is unique and designID is repeated.
Basically, I want to DISTINCT or GROUP BY designID and get the value of the rest of row even though they are not repeated.
Example:
123 | A-1 | RED | 2 | 3 | $200 | etc.
135 | A-2 | BLU | 8 | 4 | $150 | etc.
After all, I should be able to sort them by either column. I already tried GROUP BY and DISTINCT but non of them return the rest of the row's value.
Example:
SELECT DISTINCT designID
FROM tbl_name
Which will return
A-1
A-2
and no other data.
GROUP BY example:
SELECT designID, designColor
FROM tbl_name
GROUP BY designID, designColor
Which will return
A-01 | RED
A-02 | BLU
Any idea so I can have DISTINCT result with all the row values?
Thanks in advance.
====================================
Thanks everybody for all your time and tips, Please let me describe more;
basically I need to eliminate the repeated designID and show just one of them and it doen't matter which one of them, first, middle or last one. Important is the one I show has to have all the row information, like SKU, Price, Size, etc. I dont't know, maybe I should use a different code rather than DISTINCT or GROUP BY.
Here is what I want from database.
Unless I misunderstand, you can SELECT DISTINCT on multiple columns:
SELECT
DISTINCT designID,
designColor,
width,
height,
price
FROM tbl_name
ORDER BY designColor
This will give you all the unique rows. If you have, for example, two designID values across 15 total rows with 2 and 3 different designColor values respectively, this will give you 5 rows.
If you don't care which row will be returned, you could use MAX and a subquery-group by:
create table #test(
SKU int,
designID varchar(10),
designColor varchar(10),
width int,
height int,
price real,
etc varchar(50)
)
insert into #test values(123, 'A-1' ,'RED', 2, 3, '200', 'etc')
insert into #test values(135, 'A-2' ,'BLUE', 8, 4, '150', 'etc')
insert into #test values(128, 'A-2' ,'YELLOW', 6, 9, '300', 'etc')
select t.* FROM #test t INNER JOIN
(
SELECT MAX(SKU) as MaxSKU,designID
FROM #test
GROUP BY designID
) tt
ON t.SKU = tt.MaxSKU;
drop table #test;
Result:
SKU designID designColor width height price etc
123 A-1 RED 2 3 200 etc
135 A-2 BLUE 8 4 150 etc
If they are all guaranteed to be duplicate (100% i.e. all columns) then a distinct would be your friend. i.e.
SELECT DISTINCT design_id, designColor, width, height, price FROM tbl_name
This will give distinct values on everything except SKU (which will always be unique and foil your distinct.
If you want unique designId values and the other results are different, then you need to figure out which of the values you want. If you really don't care, you can just arbitarily pick and aggregate function (say, MIN) and use GROUP BY
i.e.
SELECT designID, MIN(designColor) FROM tbl_name GROUP BY designID
This will give you a unique design id and a value for the other columns.
If you want the designID for the biggest skew, you could use a ranking function i.e.
;WITH rankedSKUs
AS
(
SELECT SKU, ROW_NUMBER() OVER(ORDER BY SKU DESC) as id
FROM tbl_name
)
SELECT *
FROM tbl_name T
WHERE EXISTS(SELECT * FROM rankedSKUs where id = 1 and SKU = T.sku)
This will return all columns for each distinct designID taking the largest value for SKU as authoritative for each designed.
If you want return every field, you might as well remove the distinct (assuming you have an id like you seem to).
Your request is really weird because if you take say,
SELECT DISTINCT designID
FROM tbl_name
you get a list of unique design id's, and if you then look up in the table for all rows with those id's, you'll get every single row in the table.
As a side note, the use of distinct usually means you designed your database badly (ie, not normalized) or that you designed your query badly (ie, you know, really badly). My money is on the former.
If you use LINQ you can use something like this:
get_data_context().my_table.GroupBy( t => t.designID ).Select( t => new { t.Key,
REST = t.Select( u => new { u.SKU , u.designID , u.designColor , u.width ,
u.height , u.price } ) } );