SQL: Count lost values by batch - sql

I have a table test with column Batch and ID. I would like to count how many IDs are missing in every batch compared with the earliest batch, like comparing batch 2 vs batch 1 for the value of batch 2 below.
SELECT COUNT(T1.ID) AS LOST_CNT FROM
(SELECT * FROM TEST WHERE BATCH=1)T1
LEFT JOIN (SELECT * FROM TEST WHERE BATCH=2)T2
ON T1.ID=T2.ID WHERE T2.ID IS NULL
I would like to get lost_cnt for every batch as the number of batch will increase over time. Something like below does not return what I want.(I understand why, just putting it here as failed attempt)
SELECT A.BATCH,
COUNT(DISTINCT CASE WHEN A.ID IS NULL THEN M.ID ELSE NULL END) AS lost_cnt
FROM
(SELECT DISTINCT ID FROM TEST WHERE BATCH=(SELECT MIN(BATCH) FROM TEST)) M
LEFT JOIN TEST A ON M.ID=A.ID
GROUP BY 1;
Is there a way to get what I want?

It's not totally clear what you want to achieve, but I guess you want to find how many ids are missing compared to the first batch. You can just filter the table with the id in the first batch, count the number of id's in each batch and subtract from the count for the first batch.
with t as (
select *
from test
where id in (
select id
from test
where batch = (select min(batch) from test)
)
)
select
batch,
(select count(distinct id)
from t
where batch = (select min(batch) from test)
) - count(distinct id) as missing
from t
group by batch
order by batch;
sample data:
batch id
1 1
1 2
1 3
2 2
2 3
2 4
3 3
3 4
results:
batch missing
1 0
2 1
3 2

You can use lag analytical function to find the prev batch and then find the id if exists in previous batch using NOT EXISTS as follows:
SELECT T.BATCH, T.ID
FROM ( SELECT T.BATCH, T.ID,
LAG(BATCH) OVER( ORDER BY BATCH) AS PREV_BATCH
FROM YOUR_TABLE T ) T
WHERE NOT EXISTS (
SELECT 1
FROM YOUR_TABLE TT
WHERE TT.BATCH = T.PREV_BATCH
AND TT.ID = T.ID)

In Hive, I would approach this using window functions:
with firstbatch (
select t.*, count(*) over () as num_in_first_batch
from (select t.*,
min(batch) over () as min_batch
from t
) t
where min_batch = 1
)
select t.batch,
count(fb.id) as num_in_first_batch,
(fb.num_in_first_batch - count(fb.id)) as num_missing_in_first_batch
from t left join
first_batch fb
on t.id = fb.id
group by t.batch, fb.num_in_first_batch;

Related

SQL Partition by with conditions

I want to partition the data on the basis of two columns Type and Env and fetch the top 5 records for each partition order by count desc. The problem that I'm facing is that I need to partition the Env on the basis of LIKE condition.
Data -
Type
Environment
Count
T1
E1
1
T1
M1
2
T1
AB1
3
T2
E1
1
T2
M1
2
T2
CB1
3
T2
M1
5
The result that I want - Let's say I'm fetching top (1) record for now
Type
Environment
Count
T1
M1
2
T1
AB1
3
T2
CB1
3
T2
M1
5
Here I'm dividing the env on condition (env LIKE "%M%" and env NOT LIKE "%M")
One approach that I can think of is using partition and union but this is a very expensive call due to the large amount of data that I'm filtering from. Is there a better way to achieve this?
SELECT
*
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Type ORDER BY Count DESC) AS maxCount
FROM
table
WHERE
Env LIKE '%M%'
) AS t1
WHERE
t1.maxCount <= 5
UNION
SELECT
*
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Type ORDER BY Count DESC) AS maxCount
FROM
table
WHERE
Env NOT LIKE '%M%'
) AS t1
WHERE
t1.maxCount <= 5
You would seem to want an additional partition by in your row_number():
select t.*
from (select t.*,
row_number() over (partition by type, case when environment like '%M%' then 1 else 2 end)
order by count desc
) as seqnum
from t
) t
where seqnum <= 5;

Fetching the next 3 or adjacent rows based upon a condition in postgreSQL

I have a database of more than 10,000 rows. eg:
id text
1 abc
2 ghj
3 cde
4 hif
5 klm
6 bbc
7 jkl
8 mno
9 dbo
10 ijk
I need to fetch the next three rows where the text matches a condition.
For eg: if I am doing a text like '%bc% query it should return me rows with ids 1,2,3,4,6,7,8,9 as row #1 and #6 is a match
Use below query to get the desired result. I am assuming you want to calculate next based on ID only and ID is always increment by 1, as you have mentioned in question.
If ID doesn't always increment by 1 , then first add a ROW Number and then replace id in t2 subquery and join condition with row number.
select t1.id, t1.id_text
from test t1
join
(
select id from test where id_text like '%bc%'
UNION
select id+1 from test where id_text like '%bc%'
UNION
select id+2 from test where id_text like '%bc%'
UNION
select id+3 from test where id_text like '%bc%'
) t2
on t1.id = t2.id;
SQL Fiddle Link
with -- Test data
t(i, x) as (values
(1,'abc'),(2,'ghj'),(3,'cde'),(4,'hif'),(5,'klm'),(6,'bbc'),(7,'jkl'),(8,'mno'),(9,'dbo'),(10,'ijk'))
select r.*
from
t as t0 cross join lateral (
select *
from t
where t.i >= t0.i
order by t.i
limit 4) as r
where t0.x like '%bc%'
order by r.i;
Lateral joins allows to use previous table in the next subquery.
You could use something like this:
SELECT next.*
FROM test, test next
WHERE test.text LIKE '%bc%'
AND (test.id + 1 = next.id OR test.id + 2 = next.id OR test.id + 3 = next.id)
I am not going to assume that the ids have no gaps. One method uses lag():
select t.*
from (select t.*,
lag(text) over (order by id) as prev_text,
lag(text, 2) over (order by id) as prev_text2,
lag(text, 3) over (order by id) as prev_text3
from t
) t
where text like '%bc%' or
prev_text like '%bc%' or
prev_text2 like '%bc%' or
prev_text3 like '%bc%';
You can also do this with one comparison, using other window functions:
select id, text
from (select t.*,
sum( (text like '%bc%')::int ) over (order by id rows between 3 preceding and current row) as cnt
from t
) t
where cnt > 0;
With an index on id, this might be the fastest approach to solving the problem.

Find the latest 3 records with the same status

I need to find the latest 3 records for each user that has a particular status on 'Fail'. At first it seems easy but I just can't seem to get it right.
So in a table of:
ID Date Status
1 2017-01-01 Fail
1 2017-01-02 Fail
1 2017-02-04 Fail
1 2015-03-21 Pass
1 2014-02-19 Fail
1 2016-10-23 Pass
2 2017-01-01 Fail
2 2017-01-02 Pass
2 2017-02-04 Fail
2 2016-10-23 Fail
I would expect ID 1 to be returned as the most recent 3 records are fails, but not ID 2, as they have a pass within their three fails. Each user may have any number of Pass and Fail records. There are thousands of different IDs
So far I've tried a CTE with ROW_NUMBER() to order the attempts but can't think of a way to ensure that the latest three results all have the same status of Fail.
Expected Results
ID Latest Fail Date Count
1 2017-02-04 3
Maybe try something like this:
WITH cte
AS
(
SELECT id,
date,
status,
ROW_NUMBER () OVER (PARTITION BY id ORDER BY date DESC) row
FROM #table
),cte2
AS
(
SELECT id, max(date) as date, count(*) AS count
FROM cte
WHERE status = 'fail'
AND row <= 3
GROUP BY id
)
SELECT id,
date AS latest_fail,
count
FROM cte2
WHERE count = 3
Check This.
Demo : Here
with CTE as
(
select *,ROW_NUMBER () over( partition by id order by date desc) rnk
from temp
where Status ='Fail'
)
select top 1 ID,max(DATE) as Latest_Fail_Date ,COUNT(rnk) as count
from CTE where rnk <=3
group by ID
Ouptut :
I think you can do this using cross apply:
select i.id
from (select distinct id from t) i cross apply
(select sum(case when t.status = 'Fail' then 1 else 0 end) as numFails
from (select top 3 t.*
from t
where t.id = i.id
order by date desc
) ti
) ti
where numFails = 3;
Note: You probably have a table with all the ids. If so, you an use that instead of the select distinct subquery.
Or, similarly:
select i.id
from (select distinct id from t) i cross apply
(select top 3 t.*
from t
where t.id = i.id
order by date desc
) ti
group by i.id
having min(ti.status) = 'Fail' and max(ti.status) = 'Fail' and
count(*) = 3;
Here you go:
declare #numOfTries int = 3;
with fails_nums as
(
select *, row_number() over (partition by ID order by [Date] desc) as rn
from #fails
)
select ID, max([Date]) [Date], count(*) as [count]
from fails_nums fn1
where fn1.rn <= #numOftries
group by ID
having count(case when [Status]='Fail' then [Status] end) = #numOfTries
Example here

SQL Get rows based on conditions

I'm currently having trouble writing the business logic to get rows from a table with id's and a flag which I have appended to it.
For example,
id: id seq num: flag: Date:
A 1 N ..
A 2 N ..
A 3 N
A 4 Y
B 1 N
B 2 Y
B 3 N
C 1 N
C 2 N
The end result I'm trying to achieve is that:
For each unique ID I just want to retrieve one row with the condition for that row being that
If the flag was a "Y" then return that row.
Else return the last "N" row.
Another thing to note is that the 'Y' flag is not always necessarily the last
I've been trying to get a case condition using a partition like
OVER (PARTITION BY A."ID" ORDER BY A."Seq num") but so far no luck.
-- EDIT:
From the table, the sample result would be:
id: id seq num: flag: date:
A 4 Y ..
B 2 Y ..
C 2 N ..
Using a window clause is the right idea. You should partition the results by the ID (as you've done), and order them so the Y flag rows come first, then all the N flag rows in descending date order, and pick the first for each id:
SELECT id, id_seq_num, flag, date
FROM (SELECT id, id_seq_num, flag, date,
ROW_NUMBER() OVER (PARTITION BY id
ORDER BY CASE flag WHEN 'Y' THEN 0
ELSE 1
END ASC,
date ASC) AS rk
FROM mytable) t
WHERE rk = 1
My approach is to take a UNION of two queries. The first query simply selects all Yes records, assuming that Yes only appears once per ID group. The second query targets only those ID having no Yes anywhere. For those records, we use the row number to select the most recent No record.
WITH cte1 AS (
SELECT id
FROM yourTable
GROUP BY id
HAVING SUM(CASE WHEN flag = 'Y' THEN 1 ELSE 0 END) = 0
),
cte2 AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY t1.id ORDER BY t1."id seq" DESC) rn
FROM yourTable t1
INNER JOIN cte1 t2
ON t1.id = t2.id
)
SELECT *
FROM yourTable
WHERE flag = 'Y'
UNION ALL
SELECT *
FROM cte2 t2
WHERE t2.rn = 1
Here's one way (with quite generic SQL):
select t1.*
from Table1 as t1
where t1.id_seq_num = COALESCE(
(select max(id_seq_num) from Table1 as T2 where t1.id = t2.id and t2.flag = 'Y') ,
(select max(id_seq_num) from Table1 as T3 where t1.id = t3.id and t3.flag = 'N') )
Available in a fiddle here: http://sqlfiddle.com/#!9/5f7f9/6
SELECT DISTINCT id, flag
FROM yourTable

Moving Average / Rolling Average

I have 2 columns in MS SQL one is Serial no. and other is values. I need the thrird column which gives me the sum of the value in that row and the next 2.
Ex
SNo values
1 2
2 3
3 1
4 2
5 6
7 9
8 3
9 2
So I need third column which has sum of 2+3+1, 3+1+2 and So on, so the 8th and 9th row will not have any values:
1 2 6
2 3 6
3 1 4
4 2 5
5 1 6
7 2 7
8 3
9 2
Can the Solution be generic so that I can Varry the current window size of adding 3 numbers to a bigger number say 60.
Here is the SQL Fiddle that demonstrates the following query:
WITH TempS as
(
SELECT s.SNo, s.value,
ROW_NUMBER() OVER (ORDER BY s.SNo) AS RowNumber
FROM MyTable AS s
)
SELECT m.SNo, m.value,
(
SELECT SUM(s.value)
FROM TempS AS s
WHERE RowNumber >= m.RowNumber
AND RowNumber <= m.RowNumber + 2
) AS Sum3InRow
FROM TempS AS m
In your question you were asking to sum 3 consecutive values. You modified your question saying the number of consecutive records you need to sum could change. In the above query you simple need to change the m.RowNumber + 2 to what ever you need.
So if you need 60, then use
m.RowNumber + 59
As you can see it is very flexible since you only have to change one number.
In case the sno field is not sequential, you can use row_number() with aggregation:
with ss as (
select sno, values, row_number() over (order by sno) as seqnum
from s
)
select s1.sno, s1.values,
(case when count(s2.values) = 3 then sum(s2.values) end) as avg3
from ss s1 left outer join
ss s2
on s2.seqnum between s1.seqnum - 2 and s1.seqnum
group by s1.sno, s1.values;
select one.sno, one.values, one.values+two.values+three.values as thesum
from yourtable as one
left join yourtable as two
on one.sno=two.sno-1
left join yourtable as three
on one.sno=three.sno-2
Or, as requested in your comment, you could do this:
select sno, sum(values)
over (
order by sno
rows between current row and 3 following
)
from yourtable
If you need a fully generic solution, where you can sum, for example, current row + next row + 5th following row:
Step 1: Create an table listing the offsets needed. 0 = current row, 1 = next row, -1 = prev row, etc
SELECT * FROM (VALUES
(0),(1),(2)
) o(offset)
Step 2: Use that offset table in this template (via CTE or an actual table):
WITH o AS (SELECT * FROM (VALUES (0),(1),(2) ) o(offset))
SELECT
t1.sno,
t1.value,
SUM(t2.Value)
FROM #t t1
INNER JOIN #t t2 CROSS JOIN o
ON t2.sno = t1.sno + o.offset
GROUP BY t1.sno,t1.value
ORDER BY t1.sno
Also, if SNo is not sequential, you can fetch ROW_NUMBER() and join on that instead.
WITH
o AS (SELECT * FROM (VALUES (0),(1),(2) ) o(offset)),
t AS (SELECT *,ROW_NUMBER() OVER(ORDER BY sno) i FROM #t)
SELECT
t1.sno,
t1.value,
SUM(t2.Value)
FROM t t1
INNER JOIN t t2 CROSS JOIN o
ON t2.i = t1.i + o.offset
GROUP BY t1.sno,t1.value
ORDER BY t1.sno