SQL to Return missing Row - sql

I have one Scenario where I need to find missing records in Table using SQL - without using Cursor, Views, SP.
For a particular CustID initial Start_Date will be 19000101 and End_date will be any random date.
Then for next Record for the same CustID will have its Start_Date as End_Date (of previous Record) + 1.
Its End_Date again will be any random date.
And so on….
For Last record of same CustID its end Date will be 99991231.
Following population of data will explain it better.
CustID Start_Date End_Date
1 19000101 20121231
1 20130101 20130831
1 20130901 20140321
1 20140321 99991231
Basically I am trying to populate data like in SCD2 scenario.
Now I want to find missing record (or CustID).
Like below we don’t have record with CustID = 4 with Start_Date = 20120606 and End_Date = 20140101
CustID Start_Date End_Date
4 19000101 20120605
4 20140102 99991231
Code for Creating Table
CREATE TABLE TestTable
(
CustID int,
Start_Date int,
End_Date int
)
INSERT INTO TestTable values (1,19000101,20121231)
INSERT INTO TestTable values (1,20130101,20130831)
INSERT INTO TestTable values (1,20130901,20140321)
INSERT INTO TestTable values (1,20140321,99991231)
INSERT INTO TestTable values (2,19000101,99991213)
INSERT INTO TestTable values (3,19000101,20140202)
INSERT INTO TestTable values (3,20140203,99991231)
INSERT INTO TestTable values (4,19000101,20120605)
--INSERT INTO TestTable values (4,20120606,20140101) --Missing Value
INSERT INTO TestTable values (4,20140102,99991231)
Now SQL should return CustID = 4 as its has missing Value.

My idea is based on this logic. Lets assume 19000101 as 1 and 99991231 as 10. Now for all IDs, if you subtract the End_date - start_date and add them up, the total sum must be equal to 9 (10 - 1). You can do the same here
SELECT ID, SUM(END_DATE - START_DATE) as total from TABLE group by ID where total < (MAX_END_DATE - MIN_START_DATE)
You might want to find the command in your SQL that gives the number of days between 2 days and use that in the SUM part.
Lets take the following example
1 1900 2003
1 2003 9999
2 1900 2222
2 2222 9977
3 1900 9999
The query will be executed as follows
1 (2003 - 1900) + (9999 - 2003) = 1 8098
2 (2222 - 1900) + (9977 - 2222) = 2 9077
3 (9999 - 1900) = 3 8098
The where clause will eliminate 1 and 3 giving you only 2, which is what you want.

If you just need the CustID then this will do
SELECT t1.CustID
FROM TestTable t1
LEFT JOIN TestTable t2
ON DATEADD(D, 1, t1.Start_Date) = t2.Start_Date
WHERE t2.CustID IS NULL
GROUP BY t1.CustID

You need rows if the one of the following conditions is met:
Not a final row (99991231) and no matching next row
Not a start row (19000101) and no matching previous row
You can left join to the same table to find previous and next rows and filter the results where you don't find a row by checking the column values for null:
SELECT t1.CustID, t1.StartDate, t1.EndDate
FROM TestTable t1
LEFT JOIN TestTable tPrevious on tPrevious.CustID = t1.CustID
and tPrevious.EndDate = t1.StartDate - 1
LEFT JOIN TestTable tNext on tNext.CustID = t1.CustID
and tNext.StartDate = t1.EndDate + 1
WHERE (t1.EndDate <> 99991231 and tNext.CustID is null) -- no following
or (t1.StartDate <> 19000101 and tPrevious.CustID is null) -- no previous

Related

SQL Server and difficulty generating column with unique values

I'm using Microsoft SQL Server Management Studio, and I want a new column that calculates the following:
If it has an ‘Exec’ value for category, it takes the ‘enddate’.
If it has an ‘Scop’ value for category, it takes the ‘start date’.
This new column calculates the number of months between these two.
I want SQL to do the calculation for a given id, so each id will have different values calculated.
At the moment it takes the minimum enddate and minimum 'startdate' for the entire table.
SELECT
id, category, startdate, enddate,
CASE
WHEN id = id
THEN DATEDIFF(month,
(SELECT MIN(enddate) from [A].[PP] where category = 'Exec'),
(SELECT MIN(startdate) from [A].[PP] where category = 'Scop')) --AS datemodify
ELSE NULL
END
FROM
[A].[PP]
WHERE
startdate IS NOT NULL
AND (category = 'Exec' OR category = 'Scop')
ORDER BY
id ASC
Results it produces at the moment:
id
category
startdate
enddate
NewCOlumn
1
Scop
2022-11-1
2022-10-1
11
1
Exec
2023-11-1
2023-10-1
11
2
Scop
2022-11-1
2022-10-1
11
2
Exec
2023-11-1
2023-09-1
11
The results I want:
id
category
startdate
enddate
NewCOlumn
1
Scop
2021-11-1
2022-10-1
24
1
Exec
2023-11-1
2023-11-1
24
2
Scop
2022-11-1
2022-10-1
11
2
Exec
2023-11-1
2023-09-1
11
Based on comments I'm not sure you still know you want as your output so I've come up with two different versions.
Here's how I'm created a version of your data set:
INSERT INTO #TempTable (ID, Category, StartDate, EndDate)
VALUES (1, 'Scop', '2021-11-01', '2022-10-01'),
(1, 'Exec', '2023-11-01', '2023-10-01'),
(2, 'Scop', '2022-11-01', '2022-10-01'),
(2, 'Exec', '2023-11-01', '2023-10-01');
This is the first version, this created your two lines per ID but hacks the StartDate and EndDate from different rows. This works by selecting all of the data straight out of the temp table, it then goes on to say if the row is Category = Scop then do a DateDiff between the StartDate and then fetches the EndDate from a subquery where the IDs match and the Category = Exec (it also has the same logic applied but the other way around for where the initial Category = Exec):
SELECT TT.ID,
TT.Category,
TT.StartDate,
TT.EndDate,
CASE
WHEN TT.Category = 'Scop' THEN DATEDIFF(M, TT.StartDate, (SELECT EndDate FROM #TempTable WHERE Category = 'Exec' AND ID = TT.ID))
ELSE CASE
WHEN TT.Category = 'Exec' THEN DATEDIFF(M, (SELECT StartDate FROM #TempTable WHERE Category = 'Scop' AND ID = TT.ID), TT.EndDate)
END
END AS DateDiffCalc
FROM #TempTable AS TT;
This version compresses the IDs to a single row, it initially only fetches Scop data, but then joins back to itself using ID and specifices now to get the Exec data only. Now you can DateDiff between the Scop StartDate and the Exec EndDate
SELECT DISTINCT t1.ID,
t1.Category,
t1.StartDate,
T2.Category,
T2.EndDate,
DATEDIFF(M, t1.StartDate, T2.EndDate) AS DateDiffCalc
FROM #TempTable AS t1
INNER JOIN #TempTable AS T2 ON T2.ID = T2.ID AND T2.Category = 'Exec'
WHERE t1.Category = 'Scop'
ORDER BY t1.ID;

SQL - Select date ranges without overlapping

I have the following table (Oracle database):
ID
valid_from
valid_to
1
01.01.22
28.02.22
1
01.03.22
30.06.22
1
01.07.22
31.12.22
1
01.01.23
null
2
01.01.22
31.03.22
2
01.04.22
null
How do I best extract now all date ranges without overlaps over both IDs? The final result set should look like:
valid_from
valid_to
01.01.22
28.02.22
01.03.22
31.03.22
01.04.22
30.06.22
01.07.23
31.12.22
01.01.23
null
Null stands for max_date (PL / SQL Oracle Max Date).
Moreover, I should only select the values valid for the current year (let's assume we are already in 2022).
Thanks for your help in advance!
You can apply next select statement:
with
-- main table
t1 AS (SELECT w, q1, q2, to_date(q1,'dd.mm.yy') q1d, to_date(q2,'dd.mm.yy') q2d FROM www)
-- custom year in YYYY format
, t0 AS (SELECT '2022' y FROM dual)
-- join and order dates FROM - TO
, t2 AS (SELECT t1.q1, t1.q1d, s2.q2, s2.q2d
FROM t1
LEFT JOIN t1 s2 on t1.q1d <= s2.q2d
ORDER BY t1.q1d, s2.q2d)
-- mark the first each new row-pair by row_number()
, t3 AS (SELECT t2.*,
row_number() OVER (PARTITION BY t2.q1d ORDER BY t2.q1d ) r
FROM t2 )
-- join custom year value and select desired rows based on that value
SELECT q1, q2 FROM t3
JOIN t0 on 1=1
WHERE r = 1
-- for the custom year
AND t0.y <= to_char(q1d, 'yyyy')
ORDER BY q1d;
Demo
In my table-example dates are presented in varchar2 datatype and in dd.mm.yy date format. In case if your table fields have datatype date, then you don't need to implement function to_date() for those 2 fields.
Used table sample:
create table www (w integer, q1 varchar2(30), q2 varchar2(30));
insert into www values (1, '01.01.22', '28.02.22');
insert into www values (1, '01.03.22', '30.06.22');
insert into www values (1, '01.07.22', '31.12.22');
insert into www values (1, '01.01.23', '');
insert into www values (2, '01.01.22', '31.03.22');
insert into www values (2, '01.04.22', '');
If your table sample has more rows which are have null value in the field valid_to and the dates in valid_from are not in any range, let's say:
insert into www values (1, '01.01.24', '');
then previous solution will produce more rows in the end with null value.
In this case you can use that more complex solution:
...
-- join custom year value and select desired rows based on that value
, t4 as (SELECT q1, q2, q1d FROM t3
JOIN t0 on 1=1
WHERE r = 1 AND
-- for the custom year
t0.y <= to_char(q1d, 'yyyy')
ORDER BY q1d)
-- filter non-nullable rows
, t5 as ( SELECT q1, q2 FROM t4 WHERE Q2 IS NOT NULL )
-- max date from rows where Q2 field has null value
, t6 as ( SELECT to_char(MAX(Q1D),'dd.mm.yy') q1, q2
FROM t4
WHERE Q2 IS NULL
GROUP BY q2)
-- append rows with max date
SELECT * FROM t5
UNION ALL
SELECT * FROM t6;
Demo

Query populating dates

query that generates records to hold a future calculated value.
Hi I trying to write a query with the tables below to populate a collection. I want the t2 values when the dates match but when there is not a match I want the dates to populate with a null values (will be populate later with a calculated value) The number of records for the same date should match the last time the dates matched. So in the example for each day after 7/1 there should be 3 records for each day and after 7/5 just 2. I am trying to do this in one query but I am not sure it is possible. Any help on creating this and getting into a collection would be much appreciated.
create table t1 as
WITH DATA AS
(SELECT to_date('07/01/2019', 'MM/DD/YYYY') date1,
to_date('07/10/2019', 'MM/DD/YYYY') date2
FROM dual
)
SELECT date1+LEVEL-1 the_date,
TO_CHAR(date1+LEVEL-1, 'DY','NLS_DATE_LANGUAGE=AMERICAN') day
FROM DATA
WHERE TO_CHAR(date1+LEVEL-1, 'DY','NLS_DATE_LANGUAGE=AMERICAN')
NOT IN ('SAT', 'SUN')
CONNECT BY LEVEL <= date2-date1+1
create table t2
(cdate date,
camount number);
insert into t2 values
('01-JUL-2019', 10);
insert into t2 values
('01-JUL-2019', 20);
insert into t2 values
('01-JUL-2019', 30);
insert into t2 values
('05-JUL-19', 50);
insert into t2 values
('05-JUL-19', 20);
expected results:
01-JUL-19 10
01-JUL-19 20
01-JUL-19 30
02-JUL-19 null
02-JUL-19 null
02-JUL-19 null
03-JUL-19 null
03-JUL-19 null
03-JUL-19 null
04-JUL-19 null
04-JUL-19 null
04-JUL-19 null
05-JUL-19 50
05-JUL-19 20
08-JUL-19 null
08-JUL-19 null
09-JUL-19 null
09-JUL-19 null
10-JUL-19 null
10-JUL-19 null
One approach to this kind of problem is to build the result set incrementally in a few steps:
Count matches that each THE_DATE in T1 has in T2.
Apply the rule you outlined in the question to those THE_DATE which have zero matches (carry forward (across dates in ascending order) the number of matches of the last THE_DATE that did have matches.
Generate the extra rows in T1 for the THE_DATE that have zero matches. (e.g. If it is supposed to have three null records, duplicate up to this number)
Outer join to T2 to get the CAMOUNT where it is available.
Here's an example (The three named subfactors corresponding to steps 1,2,3 above):
WITH DATE_MATCH_COUNT AS (
SELECT T1.THE_DATE,
COUNT(T2.CDATE) AS MATCH_COUNT,
ROW_NUMBER() OVER (PARTITION BY NULL ORDER BY T1.THE_DATE ASC) AS ROWKEY
FROM T1
LEFT OUTER JOIN T2
ON T1.THE_DATE = T2.CDATE
GROUP BY T1.THE_DATE),
ADJUSTED_MATCH_COUNT AS (
SELECT THE_DATE,
MATCH_COUNT AS ACTUAL_MATCH_COUNT,
GREATEST(MATCH_COUNT,
(SELECT MAX(MATCH_COUNT) KEEP ( DENSE_RANK LAST ORDER BY ROWKEY ASC )
FROM DATE_MATCH_COUNT SCALAR_MATCH_COUNT
WHERE SCALAR_MATCH_COUNT.ROWKEY <= DATE_MATCH_COUNT.ROWKEY AND
SCALAR_MATCH_COUNT.MATCH_COUNT > 0)) AS FORCED_MATCH_COUNT
FROM DATE_MATCH_COUNT),
GENERATED_MATCH_ROW AS (
SELECT THE_DATE, FORCED_MATCH_COUNT, MATCH_KEY
FROM ADJUSTED_MATCH_COUNT CROSS APPLY (SELECT LEVEL AS MATCH_KEY
FROM DUAL CONNECT BY LEVEL <= DECODE(ACTUAL_MATCH_COUNT,0,FORCED_MATCH_COUNT,1)))
SELECT THE_DATE, CAMOUNT
FROM GENERATED_MATCH_ROW
LEFT OUTER JOIN T2
ON GENERATED_MATCH_ROW.THE_DATE = T2.CDATE
ORDER BY THE_DATE, CAMOUNT ASC;
Result:
THE_DATE CAMOUNT
____________ __________
01-JUL-19 10
01-JUL-19 20
01-JUL-19 30
02-JUL-19
02-JUL-19
02-JUL-19
03-JUL-19
03-JUL-19
03-JUL-19
04-JUL-19
04-JUL-19
04-JUL-19
05-JUL-19 20
05-JUL-19 50
08-JUL-19
08-JUL-19
09-JUL-19
09-JUL-19
10-JUL-19
10-JUL-19

Highlight multiple records in a date range

Working with SQL Server 2008.
fromdate todate ID name
--------------------------------
1-Aug-16 7-Aug-16 x jack
3-Aug-16 4-Aug-16 x jack
5-Aug-16 6-Aug-16 x tom
1-Aug-16 2-Aug-16 x john
3-Aug-16 4-Aug-16 x harry
5-Aug-16 6-Aug-16 x mac
Is there a way to script this so that I know if there are multiple names tagged to an ID in the same date range?
For example above, I want to flag that ID x has Name Jack and Tom tagged in the same date range.
ID multiple_flag
------------------------------------------------
x yes
y no
If there is a unique index in your table (in my example it is column i but you could also generate one by means of using ROW_NUMBER()) then you can do the following query based on an INNER JOIN to find overlapping date ranges:
CREATE TABLE #tmp (i int identity primary key,fromdate date,todate date,ID int,name varchar(32));
insert into #tmp (fromdate,todate,ID ,name) values
('1-Aug-16','7-Aug-16',3,'jack'),
('3-Aug-16','4-Aug-16',3,'tom'),
('5-Aug-16','6-Aug-16',3,'jack');
select a.*,b.name bname,b.i i2 from #tmp a
INNER join #tmp b on b.id=a.id AND b.i<>a.i
AND ( b.fromdate between a.fromdate and a.todate
OR b.todate between a.fromdate and a.todate)
(My id column is int). This will give you:
i fromdate todate ID name bname i2
- ---------- ---------- - ---- ----- --
1 2016-08-01 2016-08-07 3 jack tom 2
1 2016-08-01 2016-08-07 3 jack jack 3
Implement further filtering or grouping as required. I left a little demo here.
Please check the below sql, but it might not be the optimal one..
SELECT formdate,todate,id,tab1.name,
case when tab2.#Of >1 then 'yes' else 'no' end as multiple_flag
FROM tab1
inner join (SELECT Name, COUNT(*) as #Of
FROM tab1
GROUP BY Name) as tab2 on tab1.name=tab2.name
order by tab1.id ;
add your where condition, before the order by, if you need to add some date range on your sql.
change formdate to fromdate before run this sql, as I have used formdate in my machine.
The result looks like
One way to do it is using EXISTS CASE:
Please note this part of the query:
-- make sure the records date ranges overlap
AND t1.fromdate <= t2.todate
AND t2.fromdate <= t1.todate
for an explanation on testing for overlapping ranges, read the overlap wiki.
Create and populate sample data (Please save us this step in your future questions)
DECLARE #T as table
(
fromdate date,
todate date,
ID char(1),
name varchar(10)
)
INSERT INTO #T VALUES
('2016-08-01', '2016-08-07', 'x', 'jack'),
('2016-08-03', '2016-08-04', 'x', 'tom'),
('2016-08-05', '2016-08-06', 'x', 'jack'),
('2016-08-01', '2016-08-02', 'y', 'john'),
('2016-08-03', '2016-08-04', 'y', 'harry'),
('2016-08-05', '2016-08-06', 'y', 'mac')
The query:
SELECT DISTINCT id,
CASE WHEN EXISTS
(
SELECT 1
FROM #T t2
WHERE t1.Id = t2.Id
-- make sure it's not the same record
AND t1.fromdate <> t2.fromdate
AND t1.todate <> t2.todate
-- make sure the records date ranges overlap
AND t1.fromdate <= t2.todate
AND t2.fromdate <= t1.todate
)
THEN 'Yes'
ELSE 'No'
END As multiple_flag
FROM #T t1
Results:
id multiple_flag
---- -------------
x Yes
y No

Difference in consecutive rows in SQL

I have a table which has any integer number. There is no specific criteria for a number to start but next row will be +2000 in number then above row and so on. So I want to find out through query where the difference of 2 rows are not 2000. Could you please help me on this? Comparison would be as follows:
Row 1 = 1000
2 = 3000
3 4000
4= 6000
5= 7000
Then only 3 and 5 should be output as the difference of Row 3 and Row 5 is not 2000. Row 3 should be compared with 2 and 5 should be compared with 4.
My data looks like :
Label Formorder date
test 480000 3/31/2015
test2 481000 3/31/2014
test3 482000 3/31/2015
test4 483000 3/31/2014
If you have SQL Server 2012 or above, you can use the LAG function.
LAG will give you a value in the previous row, compare this value to see if it is 2000 lower than the current row:
WITH diffs as (
SELECT rowValue,
rowValue - LAG(rowValue) OVER (ORDER BY rowValue) diff
FROM dataTable)
SELECT rowValue
FROM diffs
WHERE diff <> 2000
http://sqlfiddle.com/#!6/59d28/2
Possible solution:
declare #t table(id int, v int)
insert into #t values
(1, 1000),
(2, 3000),
(4, 7000),
(6, 9000),
(11, 11000),
(17, 17000)
select * from #t t1
outer apply(select * from (
select top 1 id as previd, v as prevv
from #t t2
where t2.id < t1.id
order by id desc)t
where t1.v - t.prevv <> 2000) oa
where oa.prevv is not null
Output:
id v previd prevv
4 7000 2 3000
17 17000 11 11000
If ID (row) does not skip
select t2.*
from table t1
jion table t2
on t2.ID = t1.ID + 1
and t2.value <> t1.value + 2000
A JOIN-based approach should work on SQL Server 2008.
In the query below, row numbers are added to the source data. Then, an inner join connects the current row to the previous row if and only if the previous row's value is not exactly 2000 less than the current row.
WITH Data AS (
SELECT *, RowNumber = ROW_NUMBER() OVER (ORDER BY rowValue)
FROM dataTable
)
SELECT n.rowValue
FROM data n
JOIN data p ON p.RowNumber = n.RowNumber - 1
AND p.rowValue != n.rowValue - 2000
http://sqlfiddle.com/#!3/59d28/10