Query to find ranges of consecutive rows - sql

I have file that contains a dump of a SQL table with 2 columns: int ID (auto increment identity field) and bit Flag. The flag = 0 means a record is good and the flag = 1 means a record is bad (contains an error). The goal is to find all blocks of consecutive bad records (with flag value of 1) with 1,000 or more rows. The solution shouldn't use cursors or while loops and it should use the set-based queries only (selects, joins etc).
We would like to see the actual queries used and the results in the following format:
StartID – EndID NumberOfErrorsInTheBlock
StartID – EndID NumberOfErrorsInTheBlock
……………………….
StartID – EndID NumberOfErrorsInTheBlock
For example if our data were only 30 records and we were looking for blocks with 5 or more records then the results would look as follows (see the screenshot below, the errors blocks that met the criteria are highlighted) :
[ID Range].....[Number of errors in the block]
11-15..... 5
19-25..... 7
sql file containing sample rows, dropbox

T-SQL Solution for SQL Server 2012 and Above
IF OBJECT_ID('tempdb..#tbl_ranges') IS NOT NULL
DROP TABLE #tbl_ranges;
CREATE TABLE #tbl_ranges
(
row_num INT PRIMARY KEY,
ID INT,
Flag BIT,
Label TINYINT
);
WITH cte_yourTable
AS
(
SELECT Id,
Flag,
CASE
--label min
WHEN Flag != LAG(flag,1) OVER (ORDER BY ID) THEN 1
--inner
WHEN Flag = LAG(flag,1) OVER (ORDER BY ID) AND Flag = LEAD(flag,1) OVER (ORDER BY ID) THEN 2
--end
WHEN Flag = LAG(flag,1) OVER (ORDER BY ID) AND Flag != LEAD(flag,1) OVER (ORDER BY ID) THEN 3
END label
FROM yourTable
)
INSERT INTO #tbl_ranges
SELECT ROW_NUMBER() OVER (ORDER BY ID) row_num,
ID,
Flag,
label
FROM cte_yourTable
WHERE label != 2;
SELECT A.ID ID_start,
B.ID ID_end,
B.ID - A.ID range_cnt
FROM #tbl_ranges A
INNER JOIN #tbl_ranges B
ON A.row_num = B.row_num - 1
AND A.Flag = B.Flag;
IF OBJECT_ID('tempdb..#tbl_ranges') IS NOT NULL
DROP TABLE #tbl_ranges;
Abbreviated Results:
ID_start ID_end range_cnt
----------- ----------- -----------
2 3 1
5 8 3
9 10 1
11 35 24
36 356 320
357 358 1
359 406 47
...

With out using Temp Table, This is the best solution, Here is the Answer and It is perfect example for CTE with in CTE ( Nested CTE )
With Evaluation (ID,Flag,Evaluate)
as
(select ID,Flag,Evaluate = ID-row_number() over (order by Flag,ID)
from [dbo].[SqltestRecordsNew]
where Flag = 1
),
Evaluation_Final (StartingRecordID,EndRecordID,Flag,cnt)
as
(
select min(ID) as StartingRecordID,max(ID) as EndRecordID,
Flag, cnt = count(*)
from Evaluation
group by Evaluate, Flag
)
select Concat(StartingRecordID,' - ', EndRecordID) as 'StartingRecordID - EndRecordId',
cnt as GroupItemCnt from Evaluation_Final
where cnt > 999
order by Concat(StartingRecordID,' - ', EndRecordID)
-- Test results Case 1
Select ID,Flag,
Case when Flag=1 then 'Success'
else 'Defect Data'
End as TestResults
from SqltestRecordsNew
where ID between 1494363 and 1495559
-- Test results Case 2
Select ID,Flag,
Case when Flag=1 then 'Success'
else 'Defect Data'
End as TestResults from SqltestRecordsNew
where ID between 1498409 and 1503899
-- Test results Case 3
Select ID,Flag,
Case when Flag=1 then 'Success'
else 'Defect Data'
End as TestResults from SqltestRecordsNew
where ID between 1548257 and 1550489

Related

How to compare a number with count result then use it in limit statement in redshift/sql

I have a table with two columns id and flag.
The data is very imbalanced. Only a few flag has value 1 and others are 0.
id flag
1 0
2 0
3 0
4 0
5 1
6 1
7 0
Now I want to create a balanced table. Therefore, I want get a subset from flag = 0 based on the number of records where flag = 1. Also, I don't want the number to be greater than 1000.
I am thinking about a code like this:
select *
from table
where flag = 0
order by random()
limit (least(1000,
select count(*)
from table
where flag = 1));
Expected result(Only two records have flag as 1 so I get two records with flag as 0, if there are more than 1000 records have flag as 1 I will only get 1000.):
id flag
2 0
7 0
If you want a balanced sample:
select t.*
from (select t.*, row_number() over (partition by flag order by flag) as seqnum,
sum(case when flag = 1 then 1 else 0 end) over () as cnt_1
from t
) t
where seqnum <= cnt_1;
You can change this to:
where seqnum <= least(cnt_1, 1000)
If you want an overall maximum.
You can use row_number to simulate LIMIT.
select * from (
select column1, column2, row_number() OVER() AS rownum
from table
where flag = 0 )
where rownum < 1000
If I’ve made a bad assumption please comment and I’ll refocus my answer.

Check whether an employee is present on three consecutive days

I have a table called tbl_A with the following schema:
After insert, I have the following data in tbl_A:
Now the question is how to write a query for the following scenario:
Put (1) in front of any employee who was present three days consecutively
Put (0) in front of employee who was not present three days consecutively
The output screen shoot:
I think we should use case statement, but I am not able to check three consecutive days from date. I hope I am helped in this
Thank you
select name, case when max(cons_days) >= 3 then 1 else 0 end as presence
from (
select name, count(*) as cons_days
from tbl_A, (values (0),(1),(2)) as a(dd)
group by name, adate + dd
)x
group by name
With a self-join on name and available = 'Y', we create an inner table with different combinations of dates for a given name and take a count of those entries in which the dates of the two instances of the table are less than 2 units apart i.e. for each value of a date adate, it will check for entries with its own value adate as well as adate + 1 and adate + 2. If all 3 entries are present, the count will be 3 and you will have a flag with value 1 for such names(this is done in the outer query). Try the below query:
SELECT Z.NAME,
CASE WHEN Z.CONSEQ_AVAIL >= 3 THEN 1 ELSE 0 END AS YOUR_FLAG
FROM
(
SELECT A.NAME,
SUM(CASE WHEN B.ADATE >= A.ADATE AND B.ADATE <= A.ADATE + 2 THEN 1 ELSE 0 END) AS CONSEQ_AVAIL
FROM
TABL_A A INNER JOIN TABL_A B
ON A.NAME = B.NAME AND A.AVAILABLE = 'Y' AND B.AVAILABLE = 'Y'
GROUP BY A.NAME
) Z;
Due to the complexity of the problem, I have not been able to test it out. If something is really wrong, please let me know and I will be happy to take down my answer.
--Below is My Approch
select Name,
Case WHen Max_Count>=3 Then 1 else 0 end as Presence
from
(
Select Name,MAx(Coun) as Max_Count
from
(
select Name, (count(*) over (partition by Name,Ref_Date)) as Coun from
(
select Name,adate + row_number() over (partition by Name order by Adate desc) as Ref_Date
from temp
where available='Y'
)
) group by Name
);
select name as employee , case when sum(diff) > =3 then 1 else 0 end as presence
from
(select id, name, Available,Adate, lead(Adate,1) over(order by name) as lead,
case when datediff(day, Adate,lead(Adate,1) over(order by name)) = 1 then 1 else 0 end as diff
from table_A
where Available = 'Y') A
group by name;

Replace NULL with values

Here is my challenge:
I have a log table which every time a record is changed adds a new record but puts a NULL value for each non-changed value in each record. In other words only the changed value is set, the rest unchanged fields in each row simply has a NULL value.
Now I would like to replace each NULL value with the value above it that is NOT a NULL value like below:
Source table: Task_log
ID Owner Status Flag
1 Bob Registrar T
2 Sue NULL NULL
3 NULL NULL F
4 Frank Admission T
5 NULL NULL F
6 NULL NULL T
Desired output table: Task_log
ID Owner Status Flag
1 Bob Registrar T
2 Sue Registrar T
3 Sue Registrar F
4 Frank Admission T
5 Frank Admission F
6 Frank Admission T
How do I write a query which will generate the desired output table?
One the new windowed function of SQLServer 2012 is FIRST_VALUE, wich have quite a direct name, it can be partitioned through the OVER clause, before using it is necessary to divide every column in data block, a block for a column begin when a value is found.
With Block As (
Select ID
, Owner
, OBlockID = SUM(Case When Owner Is Null Then 0 Else 1 End)
OVER (ORDER BY ID)
, Status
, SBlockID = SUM(Case When Status Is Null Then 0 Else 1 End)
OVER (ORDER BY ID)
, Flag
, FBlockID = SUM(Case When Flag Is Null Then 0 Else 1 End)
OVER (ORDER BY ID)
From Task_log
)
Select ID
, Owner = FIRST_VALUE(Owner) OVER (PARTITION BY OBlockID ORDER BY ID)
, Status = FIRST_VALUE(Status) OVER (PARTITION BY SBlockID ORDER BY ID)
, Flag = FIRST_VALUE(Flag) OVER (PARTITION BY FBlockID ORDER BY ID)
FROM Block
SQLFiddle demo
The UPDATE query is easily derived
As I mentioned in my comment, I would try to fix the process that is creating the records rather than fixing the junk data. If that is not an option, the code below should get you pointed in the right direction.
UPDATE t1
set t1.owner = COALESCE(t1.owner, t2.owner),
t1.Status = COALESCE(t1.status, t2.status),
t1.Flag = COALESCE(t1.flag, t2.flag)
FROM Task_log as t1
INNER JOIN Task_log as t2
ON t1.id = (t1.id + 1)
where t1.owner is null
OR t1.status is null
OR t1.flag is null
I can think of several approaches.
You could use a combination of COALESCE with an array aggregate function. Unfortunately it doesn't look like SQL Server supports array_agg natively (although some nice people have developed some workarounds).
You could also use a subselect for each column.
SELECT id,
(SELECT TOP 1 FROM (SELECT owner FROM ... WHERE id = outer_id AND owner IS NOT NULL order by ID desc )) AS owner,
-- other columns
You could probably do something with window functions, too.
A vanilla solution would be:
select id
, owner
, coalesce(owner, ( select owner from t t2
where id = (select max(id) from t t3
where id < t1.id and owner is not null))
) as new_owner
, flag
, coalesce(flag, ( select flag from t t2
where id = (select max(id) from t t3
where id < t1.id and flag is not null))
) as new_flag
from t t1
Rather inefficient, but should work on most DBMS

Counting if data exists in a row

Hey guys I have the below sample data which i want to query for.
MemberID AGEQ1 AGEQ2 AGEQ2
-----------------------------------------------------------------
1217 2 null null
58458 3 2 null
58459 null null null
58457 null 5 null
299576 6 5 7
What i need to do is to lookup the table and if any AGEx COLUMN contains any data then it counts the number of times there is data for that row in each column
Results example:
for memberID 1217 the count would be 1
for memberID 58458 the count would be 2
for memberID 58459 the count would be 0 or null
for memberID 58457 the count would be 1
for memberID 299576 the count would be 3
This is how it should look like in SQL if i query the entire table
1 Children - 2
2 Children - 1
3 Children - 1
0 Children - 1
So far i have been doing it using the following query which isnt very efficient and does give incorrect tallies as there are multiple combinations that people can answer the AGE question. Also i have to write multiple queries and change the is null to is not null depending on how many children i am looking to count a person has
select COUNT (*) as '1 Children' from Member
where AGEQ1 is not null
and AGEQ2 is null
and AGEQ3 is null
The above query only gives me an answer of 1 but i want to be able to count the other columns for data as well
Hope this is nice and clear and thank you in advance
If all of the columns are integers, you can take advantage of integer math - dividing the column by itself will yield 1, unless the value is NULL, in which case COALESCE can convert the resulting NULL to 0.
SELECT
MemberID,
COALESCE(AGEQ1 / AGEQ1, 0)
+ COALESCE(AGEQ2 / AGEQ2, 0)
+ COALESCE(AGEQ3 / AGEQ3, 0)
+ COALESCE(AGEQ4 / AGEQ4, 0)
+ COALESCE(AGEQ5 / AGEQ5, 0)
+ COALESCE(AGEQ6 / AGEQ6, 0)
FROM dbo.table_name;
To get the number of people with each count of children, then:
;WITH y(y) AS
(
SELECT TOP (7) rn = ROW_NUMBER() OVER
(ORDER BY [object_id]) - 1 FROM sys.objects
),
x AS
(
SELECT
MemberID,
x = COALESCE(AGEQ1 / AGEQ1, 0)
+ COALESCE(AGEQ2 / AGEQ2, 0)
+ COALESCE(AGEQ3 / AGEQ3, 0)
+ COALESCE(AGEQ4 / AGEQ4, 0)
+ COALESCE(AGEQ5 / AGEQ5, 0)
+ COALESCE(AGEQ6 / AGEQ6, 0)
FROM dbo.table_name
)
SELECT
NumberOfChildren = y.y,
NumberOfPeopleWithThatMany = COUNT(x.x)
FROM y LEFT OUTER JOIN x ON y.y = x.x
GROUP BY y.y ORDER BY y.y;
I'd look at using UNPIVOT. That will make your wide column into rows. Since you don't care about what value was in a column, just the presence/absence of value, this will generate a row per not-null column.
The trick then becomes mashing that into the desired output format. It could probably have been done cleaner but I'm a fan of "showing my work" so that others can conform it to their needs.
SQLFiddle
-- Using the above logic
WITH HadAges AS
(
-- Find everyone and determine number of rows
SELECT
UP.MemberID
, count(1) AS rc
FROM
dbo.Member AS M
UNPIVOT
(
ColumnValue for ColumnName in (AGEQ1, AGEQ2, AGEQ3)
) AS UP
GROUP BY
UP.MemberID
)
, NoAge AS
(
-- Account for those that didn't show up
SELECT M.MemberID
FROM
dbo.Member AS M
EXCEPT
SELECT
H.MemberID
FROM
HadAges AS H
)
, NUMBERS AS
(
-- Allowable range is 1-6
SELECT TOP 6
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS TheCount
FROM
sys.all_columns AS SC
)
, COMBINATION AS
(
-- Link those with rows to their count
SELECT
N.TheCount AS ChildCount
, H.MemberID
FROM
NUMBERS AS N
LEFT OUTER JOIN
HadAges AS H
ON H.rc = N.TheCount
UNION ALL
-- Deal with the unlinked
SELECT
0
, NA.MemberID
FROM
NoAge AS NA
)
SELECT
C.ChildCount
, COUNT(C.MemberID) AS Instances
FROM
COMBINATION AS C
GROUP BY
C.ChildCount;
Try this:
select id, a+b+c+d+e+f
from ( select id,
case when age1 is null then 0 else 1 end a,
case when age2 is null then 0 else 1 end b,
case when age3 is null then 0 else 1 end c,
case when age4 is null then 0 else 1 end d,
case when age5 is null then 0 else 1 end e,
case when age6 is null then 0 else 1 end f
from ages
) as t
See here in fiddle http://sqlfiddle.com/#!3/88020/1
To get the quantity of persons with childs
select childs, count(*) as ct
from (
select id, a+b+c+d+e+f childs
from
(
select
id,
case when age1 is null then 0 else 1 end a,
case when age2 is null then 0 else 1 end b,
case when age3 is null then 0 else 1 end c,
case when age4 is null then 0 else 1 end d,
case when age5 is null then 0 else 1 end e,
case when age6 is null then 0 else 1 end f
from ages ) as t
) ct
group by childs
order by 1
See it here at fiddle http://sqlfiddle.com/#!3/88020/24

SQL - Combining incomplete

I'm using Oracle 10g. I have a table with a number of fields of varying types. The fields contain observations that have been made by made about a particular thing on a particular date by a particular site.
So:
ItemID, Date, Observation1, Observation2, Observation3...
There are about 40 Observations in each record. The table structure cannot be changed at this point in time.
Unfortunately not all the Observations have been populated (either accidentally or because the site is incapable of making that recording). I need to combine all the records about a particular item into a single record in a query, making it as complete as possible.
A simple way to do this would be something like
SELECT
ItemID,
MAX(Date),
MAX(Observation1),
MAX(Observation2)
etc.
FROM
Table
GROUP BY
ItemID
But ideally I would like it to pick the most recent observation available, not the max/min value. I could do this by writing sub queries in the form
SELECT
ItemID,
ObservationX,
ROW_NUMBER() OVER (PARTITION BY ItemID ORDER BY Date DESC) ROWNUMBER
FROM
Table
WHERE
ObservationX IS NOT NULL
And joining all the ROWNUMBER 1s together for an ItemID but because of the number of fields this would require 40 subqueries.
My question is whether there's a more concise way of doing this that I'm missing.
Create the table and the sample date
SQL> create table observation(
2 item_id number,
3 dt date,
4 val1 number,
5 val2 number );
Table created.
SQL> insert into observation values( 1, date '2011-12-01', 1, null );
1 row created.
SQL> insert into observation values( 1, date '2011-12-02', null, 2 );
1 row created.
SQL> insert into observation values( 1, date '2011-12-03', 3, null );
1 row created.
SQL> insert into observation values( 2, date '2011-12-01', 4, null );
1 row created.
SQL> insert into observation values( 2, date '2011-12-02', 5, 6 );
1 row created.
And then use the KEEP clause on the MAX aggregate function with an ORDER BY that puts the rows with NULL observations at the end. whatever date you use in the ORDER BY needs to be earlier than the earliest real observation in the table.
SQL> ed
Wrote file afiedt.buf
1 select item_id,
2 max(val1) keep( dense_rank last
3 order by (case when val1 is not null
4 then dt
5 else date '1900-01-01'
6 end) ) val1,
7 max(val2) keep( dense_rank last
8 order by (case when val2 is not null
9 then dt
10 else date '1900-01-01'
11 end) ) val2
12 from observation
13* group by item_id
SQL> /
ITEM_ID VAL1 VAL2
---------- ---------- ----------
1 3 2
2 5 6
I suspect that there is a more elegant solution to ignore the NULL values than adding the CASE statement to the ORDER BY but the CASE gets the job done.
i dont know about commands in oracle but in sql you could use some how that
first use pivot table is contains consecutives numbers 0,1,2...
i'm not sure but in oracle the function "isnull" is "NVL"
select items.ItemId,
case p.i = 0 then observation1 else '' end as observation1,
case p.i = 0 then observation1 else '' end as observation2,
case p.i = 0 then observation1 else '' end as observation3,
...
case p.i = 39 then observation4 else '' as observation40
from (
select items.ItemId
from table as items
where items.item = _paramerter_for_retrive_only_one_item /* select one item o more item where you filter items here*/
group by items.ItemId) itemgroup
left join
(
select
items.ItemId,
p.i,
isnull( max ( case p.i = 0 then observation1 else '' end ), '' ) as observation1,
isnull( max ( case p.i = 1 then observation2 else '' end ), '' ) as observation2,
isnull( max ( case p.i = 2 then observation3 else '' end), '' ) as observation3,
...
isnull( max ( case p.i = 39 then observation4), '' ) as observation40,
from
(select i from pivot where id < 40 /*you number of columns of observations, that attach one index*/
)
as p
cross join table as items
lef join table as itemcombinations
on item.itemid = itemcombinations.itemid
where items.item = _paramerter_for_retrive_only_one_item /* select one item o more item where you filter items here*/
and (p.i = 0 and not itemcombinations.observation1 is null) /* column 1 */
and (p.i = 1 and not itemcombinations.observation2 is null) /* column 2 */
and (p.i = 2 and not itemcombinations.observation3 is null) /* column 3 */
....
and (p.i = 39 and not itemcombinations.observation3 is null) /* column 39 */
group by p.i, items.ItemId
) as itemsimplified
on itemsimplified.ItemId = itemgroup.itemId
group by itemgroup.itemId
About pivot table
create an pivot table, Take a look at that
pivot table schema
name: pivot columns: {i : datatype int}
How populate
create foo table
schema foo
name: foo column: value datatype varchar
insert into foo
values('0'),
values('1'),
values('2'),
values('3'),
values('4'),
values('5'),
values('6'),
values('7'),
values('8'),
values('9');
/* insert 100 values */
insert into pivot
select concat(a.value, a.value) /* mysql */
a.value + a.value /* sql server */
a.value | a.value /* Oracle im not sure about that sintax */
from foo a, foo b
/* insert 1000 values */
insert into pivot
select concat(a.value, b.value, c.value) /* mysql */
a.value + b.value + c.value /* sql server */
a.value | b.value | c.value /* Oracle im not sure about that sintax */
from foo a, foo b, foo c
the idea about pivot table can consult in "Transact-SQL Cookbook By Jonathan Gennick, Ales Spetic"
I have to admit that the above solution (by Justin Cave) is simpler and easier to understand but this is another good option
at the end like you said you solved