Random records in Oracle table based on conditions - sql

I have a Oracle table with the following columns
Table Structure
In a query I need to return all the records with CPER>=40 which is trivial. However, apart from CPER>=40 I need to list 5 random records for each CPID.
I have attached a sample list of records. However, in my table I have around 50,000 records.
Appreciate if you can help.

Oracle solution:
with CTE as
(
select t1.*,
row_number() over(order by DBMS_RANDOM.VALUE) as rn -- random order assigned
from MyTable t1
where CPID <40
)
select *
from CTE
where rn <=5 -- pick 5 at random
union all
select t2.*, null
from my_table t2
where CPID >= 40
SQL Server:
with CTE as
(
select t1.*,
row_number() over(order by newid()) as rn -- random order assigned
from MyTable t1
where CPID <40
)
select *
from CTE
where rn <=5 -- pick 5 at random
union all
select t2.*, null
from my_table t2
where CPID >= 40

How about something like this...
SELECT *
FROM (SELECT CID,
CVAL,
CPID,
CPER,
Row_number() OVER (partition BY CPID ORDER BY CPID ASC ) AS RN
FROM Table) tmp
WHERE CPER>=40 OR pids <= 5
However, this is not random.

Assuming that you want five additional random records, you can do:
select t.*
from (select t.*,
row_number() over (partition by cpid,
(case when cper >= 40 then 1 else 2 end)
order by dbms_random.value
) as seqnum
from t
) t
where seqnum <= 5 or cper >= 40;
The row_number() is enumerating the rows for each cpid in two groups -- based on the cper value. The outer where is taking all cper values in the range you want as well as five from the other group.

Related

SQL get entries where on attribute is max

I have the following dataset:
id
id_rev
time
1
1
08.01.2022
1
0
31.02.2021
2
2
28.01.2017
2
1
25.07.2021
2
0
25.07.2021
I am looking for a SQL query that can return an entry per id but only the one where the id_rev is maximum. So in this case it should return these two rows:
(id=1, id_rev=1,time)
(id=2, id_rev=2, time)
One canonical approach uses ROW_NUMBER:
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY id ORDER BY id_rev DESC) rn
FROM yourTable t
)
SELECT id, id_rev, time
FROM cte
WHERE rn = 1
ORDER BY id;
Another approach would be to use exists logic:
SELECT id, id_rev, time
FROM yourTable t1
WHERE NOT EXISTS (
SELECT 1
FROM yourTable t2
WHERE t2.id = t1.id AND t2.id_rev > t1.id_rev
);
#result =
SELECT
*,
RANK()
OVER (PARTITION BY id ORDER BY id_rev DESC) AS Rank
FROM dataset ORDER BY Rank;
#result =
SELECT *
FROM #result
WHERE Rank = 1;

Creating column and filtering it in one select statement

Wondering if it is possible to creating a new column and filter on that column. The following is an example:
SELECT row_number() over (partition by ID order by date asc) row# FROM table1 where row# = 1
Thanks!
Some databases support a QUALIFY clause which you might be able to use:
SELECT *
FROM table1
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY date) = 1;
On SQL Server, you may use a TOP 1 WITH TIES trick:
SELECT TOP 1 WITH TIES *
FROM table1
ORDER BY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY date);
More generally, you would have to use a subquery:
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY date) rn
FROM table1 t
)
SELECT *
FROM cte
WHERE rn = 1;
The WHERE clause is evaluated before the SELECT so your column has to exist before you can use a WHERE clause. You could achieve this by making a subquery of the original query.
SELECT *
FROM
(
SELECT row_number() over (partition by ID order by date asc) row#
FROM table1
) a
WHERE a.row# = 1

Can't find largest duplicate value in SQL Server

I have some multiple duplicate data in my table what I am trying to do I want to fetch only the largest values from the duplicate data.
I added an image for example from which I want to get only the last two row data because the first row's first column value is lower than the others and service ids are same I am trying to do this by counting the data but can't get the final result.
Currently I am using this query to count data
SELECT
ServiceId, COUNT(*) Count_Duplicate
FROM
TestDeleteTable
GROUP BY
ServiceId
HAVING
COUNT(*) > 1
ORDER BY
COUNT(*) DESC
Thanks for any help
Following query should work for you.
SELECT ServiceId,RowId FROM
(
SELECT *, COUNT(ServiceId) OVER(PARTITION BY ServiceId ORDER BY ROWID) CT, ROW_NUMBER() OVER(PARTITION BY ServiceId ORDER BY ROWID) RN
FROM TestDeleteTable
)T
WHERE T.RN> 1 AND T.CT > 1
DEMO
Another approach can be
;WITH CTE AS
(
SELECT ServiceId, MIN(ROWID) M
FROM TestDeleteTable
GROUP BY ServiceId
HAVING COUNT(*) > 1
)
SELECT * FROM TestDeleteTable T
WHERE EXISTS
(
SELECT 1 FROM CTE C WHERE C.ServiceId=T.ServiceId AND T.ROWID > C.M
)
Or simply with a INNER JOIN with CTE like following.
;WITH CTE AS
(
SELECT ServiceId, MIN(ROWID) MinValue, Count(ServiceId) CountService
FROM #t
GROUP BY ServiceId
HAVING COUNT(*) > 1
)
SELECT T.* FROM #T T
INNER JOIN CTE C ON T.ServiceId= C.ServiceId
WHERE C.CountService> 1 AND T.ROWID > C.MinValue

Oracle: I need to select n rows from every k rows of a table

For example:
My table has 10000 rows. First I will divide it in 5 sets of 2000(k) rows. Then from each set of 2000 rows I will select only top 100(n) rows.
With this approach I am trying to scan some rows of table with a specific pattern.
Assuming you are ordering them 1 - 10000 using some logic and want to output only rows 1-100,2001-2100,4001-4100,etc then you can use the ROWNUM pseudocolumn:
SELECT *
FROM (
SELECT t.*,
ROWNUM AS rn -- Secondly, assign a row number to the ordered rows
FROM (
SELECT *
FROM your_table
ORDER BY your_condition -- First, order the data
) t
)
WHERE MOD( rn - 1, 2000 ) < 100; -- Finally, filter the top 100 per 2000.
Or you could use the ROW_NUMBER() analytic function:
SELECT *
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( ORDER BY your_condition ) AS rn
FROM your_table
)
WHERE MOD( rn - 1, 2000 ) < 100;
Is it possible to increase the set of sample data exponentially. Like 1k, 2k, 4k,8k....and then fetch some rows from these.
Replace the WHERE clause with:
WHERE rn - POWER(
2,
TRUNC( CAST( LOG( 2, CEIL( rn / 1000 ) ) AS NUMBER(20,4) ) )
) * 1000 + 1000 <= 100
This solution uses the analytic ntile() to split the raw data into five buckets. That result set is labelled using the analytic row_number() which provides a filter to produce the final set:
with sq1 as ( select id, col1, ntile(5) over (order by id asc) as quintile
from t23
)
, sq2 as ( select id, col1, quintile
, row_number() over ( partition by quintile order by id asc) as rn
from sq1 )
select *
from sq2
where rn <= 200
order by quintile, rn
/
use partition by and order by with row_number. it will look like following:
row_number()over(partition by partition_column order by order_column)<=100
partition_column will be your condition to divide set.
order_column will be your condition to select top 100.

Get entries which are mostly available

For example I have the following database entries:
timestamp | value1 | value 2
----------
1452|5|7
1452|1|6
1452|2|7
1623|1|2
1623|5|6
1623|4|5
1623|4|7
1855|1|2
Now I want to have a sql query which returns me value1 only for the timestamp which is availble the most. Therefore it should return only the timestamp 1623 and it's values.
I was first thinking of count, but that will return only the number of the availability and not the entries.
select *
from T
inner join (select timestamp
from T
group by timestamp
order by count(*) desc
limit 1) t2
on T.timestamp = t2.timestamp
see it's working live in a sqlfiddle
WITH CTE AS (
SELECT *, COUNT(timestamps) OVER (PARTITION BY value1, timestamps) AS cnt
FROM mytable
), cte2 as (select *, row_number() over (partition by value1 order by cnt DESC, timestamps) as Rn FROM cte)
SELECT value1, timestamps , cnt FROM CTE2 WHERE Rn = 1;