Selecting All Rows Matching Matching First 10 IDs in Oracle - sql

This is a pagination related issue in Oracle. I have two tables, LIST and LIST_ITEM with a one-to-many relationship. I'm trying to implement pagination on the number of lists, where each LIST can contain a variable number of LIST_ITEM. Essentially, I need to grab all rows from LIST_ITEM matching the first N LIST ids. Any thoughts on implementing this in Oracle DB? Ideally without adding a separate query.
Previously I was using JPA EntityManager to implement pagination using setFirstResult() and setMaxResults(), but because the number of rows this query should return is variable, that will no longer work for me.

You can use an analytic function like dense_rank() in a subquery to rank the IDs, and then filter on the ranks you want, e.g.:
select id, col1, col2
from (
select id, col1, col2, dense_rank() over (order by id) as rnk
from list_items
)
where rnk <= 10
or for later pages
select id, col1, col2
from (
select id, col1, col2, dense_rank() over (order by id) as rnk
from list_items
)
where rnk > 10 and <= 20
If you have ID in the list table with no IDs, and you want to take those into account, then you can use a subquery against that table and join (which lets you include other list columns too):
select l.id, li.col1, li.col2
from (
select id, dense_rank() over (order by id) as rnk
from list
) l
left join list_items li on li.id = l.id
where l.rnk <= 10;
If you're on Oracle 12c or high you can use the [row limit clause] enhancements to simplify that:
select l.id, li.col1, li.col2
from (
select id
from list
order by id
fetch next 10 rows only
) l
left join list_items li on li.id = l.id;
or for the second page:
offset 10 rows fetch next 10 rows only

Related

How to select rows corresponding to a randomly selected column value in SQL

My query returns a result like shown in the table. I would like to randomly pick an ID from the ID column and get all the rows having that ID. How can I do that in SnowFlake or SQL:
ID
Postalcode
Value
...
1e3d
NK25F4
3214
...
1e3d
NK25F4
3258
...
1e3d
NK25F4
3354
...
1f74
NG2LK8
5524
1f74
NG2LK8
5548
3e9a
N6B7H4
3694
3e9a
N6B7H4
3325
38e4
N6C7H2
3654
...
There is a Snowflake function to return a fix number of "random" rows SAMPLE, so using that will reduce the need to read all rows.
SELECT t.*
FROM your_table as t
JOIN (SELECT ID FROM your_table SAMPLE (1 ROWS)) as r
ON t.id = r.id
thus using your data above:
with your_table(id, postalcode, value) as (
select * from values
('1e3d', 'NK25F4', 3214),
('1e3d', 'NK25F4', 3258),
('1e3d', 'NK25F4', 3354),
('1f74', 'NG2LK8', 5524),
('1f74', 'NG2LK8', 5548),
('3e9a', 'N6B7H4', 3694),
('3e9a', 'N6B7H4', 3325),
('38e4', 'N6C7H2', 3654)
)
I get (random set) but one looks like:
ID
POSTALCODE
VALUE
1f74
NG2LK8
5,524
1f74
NG2LK8
5,548
You could also use a NATURAL JOIN like:
SELECT *
FROM your_table
NATURAL JOIN (SELECT ID FROM your_table SAMPLE (1 ROWS))
You could put your existing query in a common table expression, then pick a random ID from it, and use it to filter the dataset:
with
dat as ( ... your query ...),
tid as (select id from dat order by random() fetch first 1 row)
select d.*
from dat d
inner join tid t on t.id = d.id
The second CTE, tid picks the random id; it does that by randomly ordering the dataset, then getting the id of the top row.
Something like
SELECT *
FROM Table_NAME
WHERE ID IN (SELECT ID FROM Table_Name ORDER BY RAND() LIMIT 1);
Should work. Though it's not particularly efficient and in many application scenarios it would arguably be more reasonable overall to compute the random ID in your application (e.g. keeping the set of all ids cached, periodically pulling it separately if need be etc).
(Note: The query assumes MYSQL, other variants may have slightly different keywords/structure, e.g. for the random function).
WITH DATA AS (
select '1e3d' id,'NK25F4' postalcode,3214 some_value union all
select '1e3d' id,'NK25F4' postalcode,3258 some_value union all
select '1e3d' id,'NK25F4' postalcode,3354 some_value union all
select '1f74' id,'NG2LK8' postalcode,5524 some_value union all
select '1f74' id,'NG2LK8' postalcode,5548 some_value union all
select '3e9a' id,'N6B7H4' postalcode,3694 some_value union all
select '3e9a' id,'N6B7H4' postalcode,3325 some_value union all
select '38e4' id,'N6C7H2' postalcode,3654 some_value )
SELECT * FROM DATA ,LATERAL (SELECT ID FROM DATA SAMPLE(2 ROWS)) I WHERE I.ID = DATA.ID
You can also play with the window frame a little and let qualify do the work
select *
from your_table
qualify id=first_value(id) over (order by random() rows between unbounded preceding and unbounded following)
Snowflake deviates from ANSI standard on the default window frames for rank-related functions (first_value, last_value, nth_value), so that makes the above equivalent to :
select *
from your_table
qualify id=first_value(id) over (order by random())

How to filter records based on a criteria per group

In SQL how can I express this question: For all groups, return all records in a group that match the result of a condition specific to the group.
For example:
Book: (Id, Title, AuthorName, PublishDate)
Return all records in Book table that if the books are grouped by the AuthorName, they are the fifth or later books of that specific author.
I'm using MS SQL Server 2016.
The solution to this is basically the same as Get top 1 row of each group, however, instead of = 1 you want >=5:
WITH CTE AS(
SELECT {Columns}
ROW_NUMBER() OVER (PARTITION BY Author ORDER By PublishDate) AS RN
FROM dbo.YourTable)
SELECT {Columns
FROM CTE
WHERE RN >= 5;
You can also use a correlated subquery for this:
select t.*
from t
where t.publishdate >= (select t2.publishdate
from t t2
where t2.author = t.author
order by t2.publishdate asc
offset 4 fetch first 1 row only
);

Using MAX to compute MAX value in a subquery column

What I am trying to do: I have a table, "band_style" with schema (band_id, style).
One band_id may occur multiple times, listed with different styles.
I want ALL rows of band_id, NUM (where NUM is the number of different styles a band has) for the band ids with the SECOND MOST number of styles.
I have spent hours on this query- almost nothing seems to be working.
This is how far I got. The table (data) successfully computes all bands with styles less than the maximum value of band styles. Now, I need ALL rows that have the Max NUM for the resulting table. This will give me bands with the second most number of styles.
However, this final result seems to be ignoring the MAX function and just returning the table (data) as is. Can someone please provide some insight/working method? I have over 20 attempts of this query with this being the closest.
Using SQL*PLUS on Oracle
WITH data AS (
SELECT band_id, COUNT(*) AS NUM FROM band_style GROUP BY band_id HAVING COUNT(*) <
(SELECT MAX(c) FROM
(SELECT COUNT(band_id) AS c
FROM band_style
GROUP BY band_id)))
SELECT data.band_id, data.NUM FROM data
INNER JOIN ( SELECT band_id m, MAX(NUM) n
FROM data GROUP BY band_id
) t
ON t.m = data.band_id
AND t.n = data.NUM;
Something like this... based on a Comment under your post, you are looking for DENSE_RANK()
select band_id
from ( select band_id, dense_rank() over (order by count(style) desc) as drk
from band_style
group by band_id
)
where drk = 2;
I would use a windowing function (RANK() in this case) - which is great for find the 'n' ranked thing in a set.
SELECT DISTINCT bs.band_id
FROM band_style bs
WHERE EXISTS (
SELECT NULL
FROM (
SELECT
bs2.band_id,
bs2.num,
RANK() OVER (ORDER BY bs2.num) AS numrank
FROM (
SELECT bs1.band_id, COUNT(*) as num
FROM band_style bs1
GROUP BY bs1.band_id ) bs2 ) bs3
WHERE bs.band_id = bs3.band_id
AND bs3.numrank = 2 )

Query historized data

To describe my query problem, the following data is helpful:
A single table contains the columns ID (int), VAL (varchar) and ORD (int)
The values of VAL may change over time by which older items identified by ID won't get updated but appended. The last valid item for ID is identified by the highest ORD value (increases over time).
T0, T1 and T2 are points in time where data got entered.
How do I get in an efficient manner to the Result set?
A solution must not involve materialized views etc. but should be expressible in a single SQL-query. Using Postgresql 9.3.
The correct way to select groupwise maximum in postgres is using DISTINCT ON
SELECT DISTINCT ON (id) sysid, id, val, ord
FROM my_table
ORDER BY id,ord DESC;
Fiddle
You want all records for which no newer record exists:
select *
from mytable
where not exists
(
select *
from mytable newer
where newer.id = mytable.id
and newer.ord > mytable.ord
)
order by id;
You can do the same with row numbers. Give the latest entry per ID the number 1 and keep these:
select sysid, id, val, ord
from
(
select
sysid, id, val, ord,
row_number() over (partition by id order by ord desc) as rn
from mytable
)
where rn = 1
order by id;
Left join the table (A) against itself (B) on the condition that B is more recent than A. Pick only the rows where B does not exist (i.e. A is the most recent row).
SELECT last_value.*
FROM my_table AS last_value
LEFT JOIN my_table
ON my_table.id = last_value.id
AND my_table.ord > last_value.ord
WHERE my_table.id IS NULL;
SQL Fiddle

Compare SQL groups against eachother

How can one filter a grouped resultset for only those groups that meet some criterion compared against the other groups? For example, only those groups that have the maximum number of constituent records?
I had thought that a subquery as follows should do the trick:
SELECT * FROM (
SELECT *, COUNT(*) AS Records
FROM T
GROUP BY X
) t HAVING Records = MAX(Records);
However the addition of the final HAVING clause results in an empty recordset... what's going on?
In MySQL (Which I assume you are using since you have posted SELECT *, COUNT(*) FROM T GROUP BY X Which would fail in all RDBMS that I know of). You can use:
SELECT T.*
FROM T
INNER JOIN
( SELECT X, COUNT(*) AS Records
FROM T
GROUP BY X
ORDER BY Records DESC
LIMIT 1
) T2
ON T2.X = T.X
This has been tested in MySQL and removes the implicit grouping/aggregation.
If you can use windowed functions and one of TOP/LIMIT with Ties or Common Table expressions it becomes even shorter:
Windowed function + CTE: (MS SQL-Server & PostgreSQL Tested)
WITH CTE AS
( SELECT *, COUNT(*) OVER(PARTITION BY X) AS Records
FROM T
)
SELECT *
FROM CTE
WHERE Records = (SELECT MAX(Records) FROM CTE)
Windowed Function with TOP (MS SQL-Server Tested)
SELECT TOP 1 WITH TIES *
FROM ( SELECT *, COUNT(*) OVER(PARTITION BY X) [Records]
FROM T
)
ORDER BY Records DESC
Lastly, I have never used oracle so apolgies for not adding a solution that works on oracle...
EDIT
My Solution for MySQL did not take into account ties, and my suggestion for a solution to this kind of steps on the toes of what you have said you want to avoid (duplicate subqueries) so I am not sure I can help after all, however just in case it is preferable here is a version that will work as required on your fiddle:
SELECT T.*
FROM T
INNER JOIN
( SELECT X
FROM T
GROUP BY X
HAVING COUNT(*) =
( SELECT COUNT(*) AS Records
FROM T
GROUP BY X
ORDER BY Records DESC
LIMIT 1
)
) T2
ON T2.X = T.X
For the exact question you give, one way to look at it is that you want the group of records where there is no other group that has more records. So if you say
SELECT taxid, COUNT(*) as howMany
GROUP by taxid
You get all counties and their counts
Then you can treat that expressions as a table by making it a subquery, and give it an alias. Below I assign two "copies" of the query the names X and Y and ask for taxids that don't have any more in one table. If there are two with the same number I'd get two or more. Different databases have proprietary syntax, notably TOP and LIMIT, that make this kind of query simpler, easier to understand.
SELECT taxid FROM
(select taxid, count(*) as HowMany from flats
GROUP by taxid) as X
WHERE NOT EXISTS
(
SELECT * from
(
SELECT taxid, count(*) as HowMany FROM
flats
GROUP by taxid
) AS Y
WHERE Y.howmany > X.howmany
)
Try this:
SELECT * FROM (
SELECT *, MAX(Records) as max_records FROM (
SELECT *, COUNT(*) AS Records
FROM T
GROUP BY X
) t
) WHERE Records = max_records
I'm sorry that I can't test the validity of this query right now.