how to detect a faulty sequence column with sql? - sql

I have this table
ID | Seq
------------
A 1
A 2
A 3
B 1
B 2
B 3
B 3 <--duplicate seq where ID=B
C 1
C 2
C 4 <--missing seq id number 3
D 1
D 2
. .
. .
Is there a way to detect if/when there is an error in the logic of the Seq column, specifically if there are jumps and/or duplicates.

try this:
this should work both in sql server as well as Oracle
select ID,seq
from(
select ID,seq,
row_number() over (partition by id order by seq ) rn
from t_seq)a
where a.seq<>a.rn
SQL fiddle demo for SQL server
SQL Fiddle demo for Oracle

These are both SQL agnostic so should work in just about any rdbms.
This will check for a break in the sequence:
select t1.id, t1.seq
from t_seq t1
where
t1.seq <> 1
and not exists (
select *
from t_seq t2
where t2.id = t1.id
and t2.seq = t1. seq - 1
)
This will check for duplicates:
select t1.id, t1.seq
from mytable t1
group by t1.id, t1.se1
having count(*) > 1

To get the duplicates you can use the following T-SQL.
SELECT ID, Seq FROM MyTable GROUP BY ID, Seq HAVING COUNT(Seq) > 1
Edit
To find out the missing sequence numbers I have updated the code provided by njr101 to follows:
SELECT ID, Seq FROM MyTable t1 WHERE ID IN (
SELECT ID FROM MyTable
GROUP BY ID
HAVING COUNT(DISTINCT Seq) <> MAX(Seq)
) AND t1.seq <> 1 AND NOT EXISTS (
SELECT * FROM MyTable t2 WHERE t2.id=t1.id AND t2.seq = t1.seq - 1
)
ORDER BY ID
The first sub query counts the number of distinct rows for that ID (ignores duplicates). If that number is the same is the maximum number contained in the result set, the values should be fine for that ID. If it is not equal, the results will be available in the sub query.
The second part (with the help of njr101's query), filters the result set to only contain the last ID and seq where missing values should be inserted. Results below:
My Data
=========
A 1
A 2
A 3
A 20 <--- Missing (displayed in results)
B 1
B 2
B 3
B 3
B 4
C 1
C 2
C 4 <--- Missing (displayed in results)
C 5
C 15 <--- Missing (displayed in results)
C 16
Results
=======
A 20
C 4
C 15

Related

SQL Server - Sum of difference between rows in a table

I have a table in the format :
SomeID SomeData
1 3
2 7
3 9
4 10
5 14
6 16
. .
. .
I want to find sum of difference between rows in this table. i.e ( (7-3) + (10-9) + (16-14) + ....)
Which is the best way to do this
Using a self join along with the modulus:
SELECT SUM(t1.SomeData - t2.SomeData) AS total_diff
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.SomeID = t2.SomeID + 1
WHERE t1.SomeID % 2 = 0;
Demo
This answer assumes that the SomeID sequence in fact starts with 1 and increments by 1 with each subsequent row. If not, then we might be able to first apply ROW_NUMBER over SomeID and generate a 1 to N sequence.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY SomeID) rn
FROM yourTable
)
SELECT SUM(t1.SomeData - t2.SomeData) AS total_diff
FROM cte t1
INNER JOIN cte t2
ON t1.SomeID = t2.SomeID + 1
WHERE t1.rn % 2 = 0;
You can try to use ROW_NUMBER window function to make a serial number then MOD by 2 to get your expected group then use condition aggregate function.
Query 1:
SELECT SUM(CASE WHEN rn = 0 THEN SomeData END) - SUM(CASE WHEN rn = 1 THEN SomeData END)
FROM (
SELECT SomeData,ROW_NUMBER() over(order by SomeID) % 2 rn
FROM t t1
) t1
Results:
| |
|---|
| 7 |

Group by and filter on 2 Distinct values in SQL

I have table T1
ID Size
A 1
A 2
A 3
B 3
B 4
C 2
C 4
I want to group by ID and filter the smallest size for each ID
Desired outcome:
A 1
B 3
C 2
I tried doing something like this:
SELECT ID, Size
FROM T1
WHERE ID IN (SELECT DISTINCT ID FROM T1)
You want a basic GROUP BY query:
SELECT ID, MIN(Size) AS Size
FROM T1
GROUP BY ID;
SELECT T1.id, MIN(T1.SIZE) AS MinimumSize FROM T1 GROUP BY T1.ID
may be you can find your solution.

SQL get the closest two rows within duplicate rows

I have following table
ID Name Stage
1 A 1
1 B 2
1 C 3
1 A 4
1 N 5
1 B 6
1 J 7
1 C 8
1 D 9
1 E 10
I need output as below with parameters A and N need to select closest rows where difference between stage is smallest
ID Name Stage
1 A 4
1 N 5
I need to select rows where difference between stage is smallest
This query can make use of an index on (name, stage) efficiently:
WITH cte AS (
SELECT TOP 1
a.id AS a_id, a.name AS a_name, a.stage AS a_stage
, n.id AS n_id, n.name AS n_name, n.stage AS n_stage
FROM tbl a
CROSS APPLY (
SELECT TOP 1 *, stage - a.stage AS diff
FROM tbl
WHERE name = 'N'
AND stage >= a.stage
ORDER BY stage
UNION ALL
SELECT TOP 1 *, a.stage - stage AS diff
FROM tbl
WHERE name = 'N'
AND stage < a.stage
ORDER BY stage DESC
) n
WHERE a.name = 'A'
ORDER BY diff
)
SELECT a_id AS id, a_name AS name, a_stage AS stage FROM cte
UNION ALL
SELECT n_id, n_name, n_stage FROM cte;
SQL Server uses CROSS APPLY in place of standard-SQL LATERAL.
In case of ties (equal difference) the winner is arbitrary, unless you add more ORDER BY expressions as tiebreaker.
dbfiddle here
This solution works, if u know the minimum difference is always 1
SELECT *
FROM myTable as a
CROSS JOIN myTable as b
where a.stage-b.stage=1;
a.ID a.Name a.Stage b.ID b.Name b.Stage
1 A 4 1 N 5
Or simpler if u don't know the minimum
SELECT *
FROM myTable as a
CROSS JOIN myTable as b
where a.stage-b.stage in (SELECT min (a.stage-b.stage)
FROM myTable as a
CROSS JOIN myTable as b)

SQL select columns group by

If I have a table which is of the following format:
ID NAME NUM TIMESTAMP BOOL
1 A 5 09:50 TRUE
1 B 6 13:01 TRUE
1 A 1 10:18 FALSE
2 A 3 12:20 FALSE
1 A 1 05:30 TRUE
1 A 12 06:00 TRUE
How can I get the ID, NAME and NUM for each unique ID, NAME pair with the latest Timestamp and BOOL=TRUE.
So for the above table the output should be:
ID NAME NUM
1 A 5
1 B 6
I tried using Group By but I cannot seem to get around that either I need to put an aggregator function around num (max, min will not work when applied to this example) or specifying it in group by (which will end up matching on ID, NAME, and NUM combined). Both as far as I can think will break in some case.
PS: I am using SQL Developer (that is the SQL developed by Oracle I think, sorry I am a newbie at this)
If you're using at least SQL-Server 2005 you can use the ROW_NUMBER function:
WITH CTE AS
(
SELECT ID, NAME, NUM,
RN = ROW_NUMBER()OVER(PARTITION BY ID, NAME ORDER BY TIMESTAMP DESC)
FROM Table
WHERE BOOL='TRUE'
)
SELECT ID, NAME, NUM FROM CTE
WHERE RN = 1
Result:
ID NAME NUM
1 A 5
1 B 6
Here's the fiddle: http://sqlfiddle.com/#!3/a1dc9/10/0
select t1.* from table as t1 inner join
(
select NAME, NUM, max(TIMESTAMP) as TIMESTAMP from table
where BOOL='TRUE'
) as t2
on t1.name=t2.name and t1.num=t2.num and t1.timestamp=t2.timestamp
where t1.BOOL='TRUE'
select t1.*
from TABLE1 as t1
left join
TABLE1 as t2
on t1.name=t2.name and t1.TIMESTAMP>t2.TIMESTAMP
where t1.BOOL='TRUE' and t2.id is null
should do it for you.

How to substract rows from one table from another only once

I'm working for a university project, and I have the following question:
I have 2 tables in a Oracle DB... I need to select those rows from table1, which are not included in table2... But the main problem is that I need to exclude that rows from table2 wich was selected once... For example:
Table1 Table2 ResultTable
id | Number | Letter id | Number | Letter id | Number | Letter
_____________________ _____________________ _____________________
1 4 S 1 6 G 2 2 P
2 2 P 2 8 B 3 5 B
3 5 B 3 4 S 4 4 S
4 4 S 4 1 A 6 2 P
5 1 A 5 1 H
6 2 P 6 2 X
So, how you see it, if one row from Table1 has a "twin" in Table2, they both are excluded.
Probably the most thorough query is this:
SELECT table1.id,
table1.digit,
table1.letter
FROM ( SELECT id,
digit,
letter,
ROW_NUMBER() OVER (PARTITION BY digit, letter ORDER BY id) rn
FROM table1
) table1
LEFT
JOIN ( SELECT id,
digit,
letter,
ROW_NUMBER() OVER (PARTITION BY digit, letter ORDER BY id) rn
FROM table2
) table2
ON table2.digit = table1.digit
AND table2.letter = table1.letter
AND table2.rn = table1.rn
WHERE table2.id IS NULL
ORDER
BY table1.id
;
which gives each record in table1 and table2 a "row number" within its group of "twins". For example, this:
SELECT id,
digit,
letter,
ROW_NUMBER() OVER (PARTITION BY digit, letter ORDER BY id) rn
FROM table1
ORDER
BY table1.id
;
returns this:
ID DIGIT LETT RN
---------- ---------- ---- ----------
1 4 S 1
2 2 P 1
3 5 B 1
4 4 S 2 -- second row with 4 S
5 1 A 1
6 2 P 2 -- second row with 2 P
That said, if you know that no (digit, letter) can ever appear more than once in table2, you can simplify this considerably by using EXISTS instead of ROW_NUMBER():
SELECT id,
digit,
letter
FROM table1 table1a
WHERE EXISTS
( SELECT 1
FROM table1
WHERE digit = table1a.digit
AND letter = table1a.letter
AND id < table1a.id
)
OR NOT EXISTS
( SELECT 1
FROM table2
WHERE digit = table1a.digit
AND letter = table1a.letter
)
;
Break it into parts.
Perhaps you have an EOR - Exclusive OR.
So you might have
(condition1
OR
condition2)
AND NOT
(condition1 AND condition2).
Use the Oracle MINUS keyword, which does exactly what you're asking for. See http://oreilly.com/catalog/mastorasql/chapter/ch07.html for more detail.
I can't see how to do what you want with one SQL SELECT.
I think you'll need a temporary table, and several statements.
Call it tmpResults, with id1 and id2, which match the id from Table1 and the id from Table2 respectively.
-- get matched rows - this is too many, we'll delete some later.
INSERT INTO tmpResults (id1, id2)
SELECT Table1.id id1, Table2.id id2
FROM Table1 INNER JOIN Table2
ON Table1.Number = Table2.Number AND Table1.Letter = Table2.Letter;
-- Delete where Table1 has matched more than 1 row
DELETE tmpResults
WHERE rowid IN
(SELECT tmpResults.RowId
FROM tmpResults
INNER JOIN
(SELECT id1, MAX(id2) id2m FROM tmpResults GROUP BY id1 HAVING count(*) > 1) m1
ON tmpResults.id1 = m1.id1 AND tmpResults.id2 = m1.id2m );
-- Delete where Table2 has matched more than 1 row
DELETE tmpResults
WHERE rowid IN
(SELECT tmpResults.RowId
FROM tmpResults
INNER JOIN
(SELECT MAX(id1) id1m, id2 FROM tmpResults GROUP BY id2 HAVING count(*) >1) m2
ON tmpResults.id1 = m2.id1m AND tmpResults.id2 = m2.id2 );
-- now tmpResults should have unique matches only, so we want Table1 where there is no match
SELECT Table1.*
FROM Table1
LEFT JOIN tmpResults
ON table1.id = tmpResults.id1
WHERE tmpResults.id2 IS NULL;