I have a Oracle table with the following columns
Table Structure
In a query I need to return all the records with CPER>=40 which is trivial. However, apart from CPER>=40 I need to list 5 random records for each CPID.
I have attached a sample list of records. However, in my table I have around 50,000 records.
Appreciate if you can help.
Oracle solution:
with CTE as
(
select t1.*,
row_number() over(order by DBMS_RANDOM.VALUE) as rn -- random order assigned
from MyTable t1
where CPID <40
)
select *
from CTE
where rn <=5 -- pick 5 at random
union all
select t2.*, null
from my_table t2
where CPID >= 40
SQL Server:
with CTE as
(
select t1.*,
row_number() over(order by newid()) as rn -- random order assigned
from MyTable t1
where CPID <40
)
select *
from CTE
where rn <=5 -- pick 5 at random
union all
select t2.*, null
from my_table t2
where CPID >= 40
How about something like this...
SELECT *
FROM (SELECT CID,
CVAL,
CPID,
CPER,
Row_number() OVER (partition BY CPID ORDER BY CPID ASC ) AS RN
FROM Table) tmp
WHERE CPER>=40 OR pids <= 5
However, this is not random.
Assuming that you want five additional random records, you can do:
select t.*
from (select t.*,
row_number() over (partition by cpid,
(case when cper >= 40 then 1 else 2 end)
order by dbms_random.value
) as seqnum
from t
) t
where seqnum <= 5 or cper >= 40;
The row_number() is enumerating the rows for each cpid in two groups -- based on the cper value. The outer where is taking all cper values in the range you want as well as five from the other group.
Related
I have the following dataset:
id
id_rev
time
1
1
08.01.2022
1
0
31.02.2021
2
2
28.01.2017
2
1
25.07.2021
2
0
25.07.2021
I am looking for a SQL query that can return an entry per id but only the one where the id_rev is maximum. So in this case it should return these two rows:
(id=1, id_rev=1,time)
(id=2, id_rev=2, time)
One canonical approach uses ROW_NUMBER:
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY id ORDER BY id_rev DESC) rn
FROM yourTable t
)
SELECT id, id_rev, time
FROM cte
WHERE rn = 1
ORDER BY id;
Another approach would be to use exists logic:
SELECT id, id_rev, time
FROM yourTable t1
WHERE NOT EXISTS (
SELECT 1
FROM yourTable t2
WHERE t2.id = t1.id AND t2.id_rev > t1.id_rev
);
#result =
SELECT
*,
RANK()
OVER (PARTITION BY id ORDER BY id_rev DESC) AS Rank
FROM dataset ORDER BY Rank;
#result =
SELECT *
FROM #result
WHERE Rank = 1;
Wondering if it is possible to creating a new column and filter on that column. The following is an example:
SELECT row_number() over (partition by ID order by date asc) row# FROM table1 where row# = 1
Thanks!
Some databases support a QUALIFY clause which you might be able to use:
SELECT *
FROM table1
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY date) = 1;
On SQL Server, you may use a TOP 1 WITH TIES trick:
SELECT TOP 1 WITH TIES *
FROM table1
ORDER BY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY date);
More generally, you would have to use a subquery:
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY date) rn
FROM table1 t
)
SELECT *
FROM cte
WHERE rn = 1;
The WHERE clause is evaluated before the SELECT so your column has to exist before you can use a WHERE clause. You could achieve this by making a subquery of the original query.
SELECT *
FROM
(
SELECT row_number() over (partition by ID order by date asc) row#
FROM table1
) a
WHERE a.row# = 1
I have some multiple duplicate data in my table what I am trying to do I want to fetch only the largest values from the duplicate data.
I added an image for example from which I want to get only the last two row data because the first row's first column value is lower than the others and service ids are same I am trying to do this by counting the data but can't get the final result.
Currently I am using this query to count data
SELECT
ServiceId, COUNT(*) Count_Duplicate
FROM
TestDeleteTable
GROUP BY
ServiceId
HAVING
COUNT(*) > 1
ORDER BY
COUNT(*) DESC
Thanks for any help
Following query should work for you.
SELECT ServiceId,RowId FROM
(
SELECT *, COUNT(ServiceId) OVER(PARTITION BY ServiceId ORDER BY ROWID) CT, ROW_NUMBER() OVER(PARTITION BY ServiceId ORDER BY ROWID) RN
FROM TestDeleteTable
)T
WHERE T.RN> 1 AND T.CT > 1
DEMO
Another approach can be
;WITH CTE AS
(
SELECT ServiceId, MIN(ROWID) M
FROM TestDeleteTable
GROUP BY ServiceId
HAVING COUNT(*) > 1
)
SELECT * FROM TestDeleteTable T
WHERE EXISTS
(
SELECT 1 FROM CTE C WHERE C.ServiceId=T.ServiceId AND T.ROWID > C.M
)
Or simply with a INNER JOIN with CTE like following.
;WITH CTE AS
(
SELECT ServiceId, MIN(ROWID) MinValue, Count(ServiceId) CountService
FROM #t
GROUP BY ServiceId
HAVING COUNT(*) > 1
)
SELECT T.* FROM #T T
INNER JOIN CTE C ON T.ServiceId= C.ServiceId
WHERE C.CountService> 1 AND T.ROWID > C.MinValue
For example:
My table has 10000 rows. First I will divide it in 5 sets of 2000(k) rows. Then from each set of 2000 rows I will select only top 100(n) rows.
With this approach I am trying to scan some rows of table with a specific pattern.
Assuming you are ordering them 1 - 10000 using some logic and want to output only rows 1-100,2001-2100,4001-4100,etc then you can use the ROWNUM pseudocolumn:
SELECT *
FROM (
SELECT t.*,
ROWNUM AS rn -- Secondly, assign a row number to the ordered rows
FROM (
SELECT *
FROM your_table
ORDER BY your_condition -- First, order the data
) t
)
WHERE MOD( rn - 1, 2000 ) < 100; -- Finally, filter the top 100 per 2000.
Or you could use the ROW_NUMBER() analytic function:
SELECT *
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( ORDER BY your_condition ) AS rn
FROM your_table
)
WHERE MOD( rn - 1, 2000 ) < 100;
Is it possible to increase the set of sample data exponentially. Like 1k, 2k, 4k,8k....and then fetch some rows from these.
Replace the WHERE clause with:
WHERE rn - POWER(
2,
TRUNC( CAST( LOG( 2, CEIL( rn / 1000 ) ) AS NUMBER(20,4) ) )
) * 1000 + 1000 <= 100
This solution uses the analytic ntile() to split the raw data into five buckets. That result set is labelled using the analytic row_number() which provides a filter to produce the final set:
with sq1 as ( select id, col1, ntile(5) over (order by id asc) as quintile
from t23
)
, sq2 as ( select id, col1, quintile
, row_number() over ( partition by quintile order by id asc) as rn
from sq1 )
select *
from sq2
where rn <= 200
order by quintile, rn
/
use partition by and order by with row_number. it will look like following:
row_number()over(partition by partition_column order by order_column)<=100
partition_column will be your condition to divide set.
order_column will be your condition to select top 100.
For example I have the following database entries:
timestamp | value1 | value 2
----------
1452|5|7
1452|1|6
1452|2|7
1623|1|2
1623|5|6
1623|4|5
1623|4|7
1855|1|2
Now I want to have a sql query which returns me value1 only for the timestamp which is availble the most. Therefore it should return only the timestamp 1623 and it's values.
I was first thinking of count, but that will return only the number of the availability and not the entries.
select *
from T
inner join (select timestamp
from T
group by timestamp
order by count(*) desc
limit 1) t2
on T.timestamp = t2.timestamp
see it's working live in a sqlfiddle
WITH CTE AS (
SELECT *, COUNT(timestamps) OVER (PARTITION BY value1, timestamps) AS cnt
FROM mytable
), cte2 as (select *, row_number() over (partition by value1 order by cnt DESC, timestamps) as Rn FROM cte)
SELECT value1, timestamps , cnt FROM CTE2 WHERE Rn = 1;