SQL Partition by with conditions - sql

I want to partition the data on the basis of two columns Type and Env and fetch the top 5 records for each partition order by count desc. The problem that I'm facing is that I need to partition the Env on the basis of LIKE condition.
Data -
Type
Environment
Count
T1
E1
1
T1
M1
2
T1
AB1
3
T2
E1
1
T2
M1
2
T2
CB1
3
T2
M1
5
The result that I want - Let's say I'm fetching top (1) record for now
Type
Environment
Count
T1
M1
2
T1
AB1
3
T2
CB1
3
T2
M1
5
Here I'm dividing the env on condition (env LIKE "%M%" and env NOT LIKE "%M")
One approach that I can think of is using partition and union but this is a very expensive call due to the large amount of data that I'm filtering from. Is there a better way to achieve this?
SELECT
*
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Type ORDER BY Count DESC) AS maxCount
FROM
table
WHERE
Env LIKE '%M%'
) AS t1
WHERE
t1.maxCount <= 5
UNION
SELECT
*
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Type ORDER BY Count DESC) AS maxCount
FROM
table
WHERE
Env NOT LIKE '%M%'
) AS t1
WHERE
t1.maxCount <= 5

You would seem to want an additional partition by in your row_number():
select t.*
from (select t.*,
row_number() over (partition by type, case when environment like '%M%' then 1 else 2 end)
order by count desc
) as seqnum
from t
) t
where seqnum <= 5;

Related

SQL: Count lost values by batch

I have a table test with column Batch and ID. I would like to count how many IDs are missing in every batch compared with the earliest batch, like comparing batch 2 vs batch 1 for the value of batch 2 below.
SELECT COUNT(T1.ID) AS LOST_CNT FROM
(SELECT * FROM TEST WHERE BATCH=1)T1
LEFT JOIN (SELECT * FROM TEST WHERE BATCH=2)T2
ON T1.ID=T2.ID WHERE T2.ID IS NULL
I would like to get lost_cnt for every batch as the number of batch will increase over time. Something like below does not return what I want.(I understand why, just putting it here as failed attempt)
SELECT A.BATCH,
COUNT(DISTINCT CASE WHEN A.ID IS NULL THEN M.ID ELSE NULL END) AS lost_cnt
FROM
(SELECT DISTINCT ID FROM TEST WHERE BATCH=(SELECT MIN(BATCH) FROM TEST)) M
LEFT JOIN TEST A ON M.ID=A.ID
GROUP BY 1;
Is there a way to get what I want?
It's not totally clear what you want to achieve, but I guess you want to find how many ids are missing compared to the first batch. You can just filter the table with the id in the first batch, count the number of id's in each batch and subtract from the count for the first batch.
with t as (
select *
from test
where id in (
select id
from test
where batch = (select min(batch) from test)
)
)
select
batch,
(select count(distinct id)
from t
where batch = (select min(batch) from test)
) - count(distinct id) as missing
from t
group by batch
order by batch;
sample data:
batch id
1 1
1 2
1 3
2 2
2 3
2 4
3 3
3 4
results:
batch missing
1 0
2 1
3 2
You can use lag analytical function to find the prev batch and then find the id if exists in previous batch using NOT EXISTS as follows:
SELECT T.BATCH, T.ID
FROM ( SELECT T.BATCH, T.ID,
LAG(BATCH) OVER( ORDER BY BATCH) AS PREV_BATCH
FROM YOUR_TABLE T ) T
WHERE NOT EXISTS (
SELECT 1
FROM YOUR_TABLE TT
WHERE TT.BATCH = T.PREV_BATCH
AND TT.ID = T.ID)
In Hive, I would approach this using window functions:
with firstbatch (
select t.*, count(*) over () as num_in_first_batch
from (select t.*,
min(batch) over () as min_batch
from t
) t
where min_batch = 1
)
select t.batch,
count(fb.id) as num_in_first_batch,
(fb.num_in_first_batch - count(fb.id)) as num_missing_in_first_batch
from t left join
first_batch fb
on t.id = fb.id
group by t.batch, fb.num_in_first_batch;

How to select the top 3 values from a group based on date and exclude duplicate value?

If I three columns and 1 column has ID, 1 column has value and 1 column has date. Example, ID column has ID1, ID2, ID3. The value for each ID has a numeric value, say 1,2,3,4,5 for each ID.
How do I only get 3 results for each ID based on the most recent date descending.
I am using Sybase SQL. Is there any way I can write this?
I tried to use Row_number() and rank() but I don't get to use either of those functions with my SQL tool.
ID value Date
1 3 20190511
1 1 20190503
1 5 20190401
2 2 20190520
2 1 20190514
2 4 20190503
3 1 20190516
3 5 20190415
3 3 20190402
If you don't have row_number try this
SELECT *
FROM yourTable t1
WHERE (SELECT COUNT(*)
FROM yourTable t2
WHERE t1.id = t2.id
AND t1.date < t2.date) < 3
So if one id have 3 or more older rows wont appear.
with row_number
SELECT *
FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) as rn
FROM YourTable t1
) as t
WHERE t.rn <= 3
I assume you cant have multiple rows in same date. In that case you may want use RANK() or DENSE_RANK() and decide how handle ties.
One method uses a correlated subquery with in:
select t.*
from t
where t.date in (select top (3) t2.date
from t t2
where t2.id = t.id
order by t2.date desc
);
Note that this assumes that the dates are unique.

Random records in Oracle table based on conditions

I have a Oracle table with the following columns
Table Structure
In a query I need to return all the records with CPER>=40 which is trivial. However, apart from CPER>=40 I need to list 5 random records for each CPID.
I have attached a sample list of records. However, in my table I have around 50,000 records.
Appreciate if you can help.
Oracle solution:
with CTE as
(
select t1.*,
row_number() over(order by DBMS_RANDOM.VALUE) as rn -- random order assigned
from MyTable t1
where CPID <40
)
select *
from CTE
where rn <=5 -- pick 5 at random
union all
select t2.*, null
from my_table t2
where CPID >= 40
SQL Server:
with CTE as
(
select t1.*,
row_number() over(order by newid()) as rn -- random order assigned
from MyTable t1
where CPID <40
)
select *
from CTE
where rn <=5 -- pick 5 at random
union all
select t2.*, null
from my_table t2
where CPID >= 40
How about something like this...
SELECT *
FROM (SELECT CID,
CVAL,
CPID,
CPER,
Row_number() OVER (partition BY CPID ORDER BY CPID ASC ) AS RN
FROM Table) tmp
WHERE CPER>=40 OR pids <= 5
However, this is not random.
Assuming that you want five additional random records, you can do:
select t.*
from (select t.*,
row_number() over (partition by cpid,
(case when cper >= 40 then 1 else 2 end)
order by dbms_random.value
) as seqnum
from t
) t
where seqnum <= 5 or cper >= 40;
The row_number() is enumerating the rows for each cpid in two groups -- based on the cper value. The outer where is taking all cper values in the range you want as well as five from the other group.

Find the latest 3 records with the same status

I need to find the latest 3 records for each user that has a particular status on 'Fail'. At first it seems easy but I just can't seem to get it right.
So in a table of:
ID Date Status
1 2017-01-01 Fail
1 2017-01-02 Fail
1 2017-02-04 Fail
1 2015-03-21 Pass
1 2014-02-19 Fail
1 2016-10-23 Pass
2 2017-01-01 Fail
2 2017-01-02 Pass
2 2017-02-04 Fail
2 2016-10-23 Fail
I would expect ID 1 to be returned as the most recent 3 records are fails, but not ID 2, as they have a pass within their three fails. Each user may have any number of Pass and Fail records. There are thousands of different IDs
So far I've tried a CTE with ROW_NUMBER() to order the attempts but can't think of a way to ensure that the latest three results all have the same status of Fail.
Expected Results
ID Latest Fail Date Count
1 2017-02-04 3
Maybe try something like this:
WITH cte
AS
(
SELECT id,
date,
status,
ROW_NUMBER () OVER (PARTITION BY id ORDER BY date DESC) row
FROM #table
),cte2
AS
(
SELECT id, max(date) as date, count(*) AS count
FROM cte
WHERE status = 'fail'
AND row <= 3
GROUP BY id
)
SELECT id,
date AS latest_fail,
count
FROM cte2
WHERE count = 3
Check This.
Demo : Here
with CTE as
(
select *,ROW_NUMBER () over( partition by id order by date desc) rnk
from temp
where Status ='Fail'
)
select top 1 ID,max(DATE) as Latest_Fail_Date ,COUNT(rnk) as count
from CTE where rnk <=3
group by ID
Ouptut :
I think you can do this using cross apply:
select i.id
from (select distinct id from t) i cross apply
(select sum(case when t.status = 'Fail' then 1 else 0 end) as numFails
from (select top 3 t.*
from t
where t.id = i.id
order by date desc
) ti
) ti
where numFails = 3;
Note: You probably have a table with all the ids. If so, you an use that instead of the select distinct subquery.
Or, similarly:
select i.id
from (select distinct id from t) i cross apply
(select top 3 t.*
from t
where t.id = i.id
order by date desc
) ti
group by i.id
having min(ti.status) = 'Fail' and max(ti.status) = 'Fail' and
count(*) = 3;
Here you go:
declare #numOfTries int = 3;
with fails_nums as
(
select *, row_number() over (partition by ID order by [Date] desc) as rn
from #fails
)
select ID, max([Date]) [Date], count(*) as [count]
from fails_nums fn1
where fn1.rn <= #numOftries
group by ID
having count(case when [Status]='Fail' then [Status] end) = #numOfTries
Example here

SQL Get rows based on conditions

I'm currently having trouble writing the business logic to get rows from a table with id's and a flag which I have appended to it.
For example,
id: id seq num: flag: Date:
A 1 N ..
A 2 N ..
A 3 N
A 4 Y
B 1 N
B 2 Y
B 3 N
C 1 N
C 2 N
The end result I'm trying to achieve is that:
For each unique ID I just want to retrieve one row with the condition for that row being that
If the flag was a "Y" then return that row.
Else return the last "N" row.
Another thing to note is that the 'Y' flag is not always necessarily the last
I've been trying to get a case condition using a partition like
OVER (PARTITION BY A."ID" ORDER BY A."Seq num") but so far no luck.
-- EDIT:
From the table, the sample result would be:
id: id seq num: flag: date:
A 4 Y ..
B 2 Y ..
C 2 N ..
Using a window clause is the right idea. You should partition the results by the ID (as you've done), and order them so the Y flag rows come first, then all the N flag rows in descending date order, and pick the first for each id:
SELECT id, id_seq_num, flag, date
FROM (SELECT id, id_seq_num, flag, date,
ROW_NUMBER() OVER (PARTITION BY id
ORDER BY CASE flag WHEN 'Y' THEN 0
ELSE 1
END ASC,
date ASC) AS rk
FROM mytable) t
WHERE rk = 1
My approach is to take a UNION of two queries. The first query simply selects all Yes records, assuming that Yes only appears once per ID group. The second query targets only those ID having no Yes anywhere. For those records, we use the row number to select the most recent No record.
WITH cte1 AS (
SELECT id
FROM yourTable
GROUP BY id
HAVING SUM(CASE WHEN flag = 'Y' THEN 1 ELSE 0 END) = 0
),
cte2 AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY t1.id ORDER BY t1."id seq" DESC) rn
FROM yourTable t1
INNER JOIN cte1 t2
ON t1.id = t2.id
)
SELECT *
FROM yourTable
WHERE flag = 'Y'
UNION ALL
SELECT *
FROM cte2 t2
WHERE t2.rn = 1
Here's one way (with quite generic SQL):
select t1.*
from Table1 as t1
where t1.id_seq_num = COALESCE(
(select max(id_seq_num) from Table1 as T2 where t1.id = t2.id and t2.flag = 'Y') ,
(select max(id_seq_num) from Table1 as T3 where t1.id = t3.id and t3.flag = 'N') )
Available in a fiddle here: http://sqlfiddle.com/#!9/5f7f9/6
SELECT DISTINCT id, flag
FROM yourTable