Top n distinct values of one column in Oracle - sql

I'm using a query where a part of it gets the top 3 of a certain column.
It creates a distinct subquery of the column, limited by 3 number of rows, and then filters those rows to the main query to do the top 3.
WITH subquery AS (
SELECT col FROM (
SELECT DISTINCT col
FROM tbl
) WHERE ROWNUM <= 3
)
SELECT col
FROM tbl
WHERE tbl.col = subquery.col
So the original table is like this:
col
-----
a
a
a
b
b
b
c
d
d
e
f
f
f
f
And the query returns the top 3 of the column (not the top 3 rows which would only be a):
col
-----
a
a
a
b
b
b
c
I'm trying to learn if there is a more correct way of doing this as the real query is big and duplicating its size with a subquery that looks almost the same just to get the top 3 is hard to work with and understand/modify.
Is there a better way to do the top first 3 distinct values of one column in Oracle?

Yes, you can use dense_rank and avoid duplicated code:
select col
from (select col, dense_rank() over (order by col) rnk from tbl)
where rnk <= 3
demo

Related

Selecting rows with shared values in two distinct fields

I searched around for this but all I can find are answers on how to select rows with the same value in both fields. I'm trying to select rows using PostgreSQL that share values in two fields with any other row in the table.
As an example:
id col1 col2
1 A X
2 A Y
3 A X
4 B Y
5 B Y
6 B X
In this case I'd want to select rows 1, 3, 4 and 5. Thanks in advance!
Use window functions:
select t.*
from (select t.*, count(*) over (partition by col1, col2) as cnt
from t
) t
where cnt > 1;

Keep Track of already summed tuples sql

If we have a table with values for a and b, is there a way to only add up the b's if its not a duplicate a? For example
a b
1 2
2 3
2 3
so we would get only 5 (instead of 8)
A sort of
select sum(b if unique a),
from table
where ...
The following query selects the lowest value of b for each group a
select min(b) min_b
from mytable
group by a
You can then sum those values by selecting the sum from a derived table
select sum(min_b) from (
select min(b) min_b
from mytable
group by a
) t
http://sqlfiddle.com/#!9/d82c5/1
You haven't specified your RDBMS, but if you are using a database which supporting window functions like SQL Server, you can query the unique rows first by using WITH clause and ROW_NUMBER() function and then get the SUM out of that.
;WITH C AS(
SELECT a, b,
ROW_NUMBER() OVER (PARTITION BY a ORDER BY a) AS Rn
FROM Table1
)
SELECT SUM(b) FROM C
WHERE Rn = 1
SQL Fiddle

Merge multiple rows with same ID into one row

How can I merge multiple rows with same ID into one row.
When value in first and second row in the same column is the same or when there is value in first row and NULL in second row.
I don't want to merge when value in first and second row in the same column is different.
I have table:
ID |A |B |C
1 NULL 31 NULL
1 412 NULL 1
2 567 38 4
2 567 NULL NULL
3 2 NULL NULL
3 5 NULL NULL
4 6 1 NULL
4 8 NULL 5
4 NULL NULL 5
I want to get table:
ID |A |B |C
1 412 31 1
2 567 38 4
3 2 NULL NULL
3 5 NULL NULL
4 6 1 NULL
4 8 NULL 5
4 NULL NULL 5
I think there's a simpler solution to the above answers (which is also correct). It basically gets the merged values that can be merged within a CTE, then merges that with the data not able to be merged.
WITH CTE AS (
SELECT
ID,
MAX(A) AS A,
MAX(B) AS B,
MAX(C) AS C
FROM dbo.Records
GROUP BY ID
HAVING MAX(A) = MIN(A)
AND MAX(B) = MIN(B)
AND MAX(C) = MIN(C)
)
SELECT *
FROM CTE
UNION ALL
SELECT *
FROM dbo.Records
WHERE ID NOT IN (SELECT ID FROM CTE)
SQL Fiddle: http://www.sqlfiddle.com/#!6/29407/1/0
WITH Collapsed AS (
SELECT
ID,
A = Min(A),
B = Min(B),
C = Min(C)
FROM
dbo.MyTable
GROUP BY
ID
HAVING
EXISTS (
SELECT Min(A), Min(B), Min(C)
INTERSECT
SELECT Max(A), Max(B), Max(C)
)
)
SELECT
*
FROM
Collapsed
UNION ALL
SELECT
*
FROM
dbo.MyTable T
WHERE
NOT EXISTS (
SELECT *
FROM Collapsed C
WHERE T.ID = C.ID
);
See this working in a SQL Fiddle
This works by creating all the mergeable rows through the use of Min and Max--which should be the same for each column within an ID and which usefully exclude NULLs--then appending to this list all the rows from the table that couldn't be merged. The special trick with EXISTS ... INTERSECT allows for the case when a column has all NULL values for an ID (and thus the Min and Max are NULL and can't equal each other). That is, it functions like Min(A) = Max(A) AND Min(B) = Max(B) AND Min(C) = Max(C) but allows for NULLs to compare as equal.
Here's a slightly different (earlier) solution I gave that may offer different performance characteristics, and being more complicated, I like less, but being a single flowing query (without a UNION) I kind of like more, too.
WITH Collapsible AS (
SELECT
ID
FROM
dbo.MyTable
GROUP BY
ID
HAVING
EXISTS (
SELECT Min(A), Min(B), Min(C)
INTERSECT
SELECT Max(A), Max(B), Max(C)
)
), Calc AS (
SELECT
T.*,
Grp = Coalesce(C.ID, Row_Number() OVER (PARTITION BY T.ID ORDER BY (SELECT 1)))
FROM
dbo.MyTable T
LEFT JOIN Collapsible C
ON T.ID = C.ID
)
SELECT
ID,
A = Min(A),
B = Min(B),
C = Min(C)
FROM
Calc
GROUP BY
ID,
Grp
;
This is also in the above SQL Fiddle.
This uses similar logic as the first query to calculate whether a group should be merged, then uses this to create a grouping key that is either the same for all rows within an ID or is different for all rows within an ID. With a final Min (Max would have worked just as well) the rows that should be merged are merged because they share a grouping key, and the rows that shouldn't be merged are not because they have distinct grouping keys over the ID.
Depending on your data set, indexes, table size, and other performance factors, either of these queries may perform better, though the second query has some work to do to catch up, with two sorts instead of one.
You can try something like this:
select
isnull(t1.A, t2.A) as A,
isnull(t1.B, t2.B) as B,
isnull(t1.C, t2.C) as C
from
table_name t1
join table_name t2 on t1.ID = t2.ID and .....
You mention the concepts of first and second. How do
you define this order? Place that order defining condition
in here: .....
Also, I assume you have exactly 2 rows for each ID value.

get subset of a table in SQL

I want to get a subset of a table, here's the example:
1 A
2 A
3 B
4 B
5 C
6 D
7 D
8 D
I want to get the unique record, but with the smallest id:
1 A
3 B
5 C
6 D
How can I write the SQL in SQL Server? Thanks!
Use a common-table expression like this:
;WITH DataCTE AS
(
SELECT ID, OtherCol,
ROW_NUM() OVER(PARTITION BY OtherCol ORDER BY ID) 'RowNum'
FROM dbo.YourTable
)
SELECT *
FROM DataCTE
WHERE RowNum = 1
This "partitions" your data by the second column you have (A, B, C) and orders by the ID (1, 2, 3) - smallest ID first.
Therefore, for each "partition" (i.e. each value of your second column), the entry with RowNum = 1 is the one with the smallest ID for each value of the second column.
select min(id), othercol
from thetable
group by othercol
and maybe with
order by othercol
... at the end if thats important
Try this:
SELECT MIN(Id) AS Id, Name
FROM MyTable
GROUP BY Name
select min(id), column2
from table
group by column2
It helps if you provide the table information in the question - I've just guessed at the column names...

Select database rows in range

I want to select the rows between A and B from a table. The table has at least A rows but it might have less than B rows.
For example if A = 2, B = 5 and the table has 3 rows it should return rows 2 and 3.
How could I get the rows in such a range?
I am using Microsoft SQL Server 2008.
You can use something similar to what's being described in this SO question.
I.E.
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY YOUR_ORDERED_FIELD) as row FROM YOUR_TABLE
) a WHERE row > 5 and row <= 10
Where A = 5 and B = 10 in your example.
SELECT *,ROW_NUMBER() OVER
(ORDER BY ordercol) AS 'rank'
FROM table
where rank between #a and #b