Splitting of string by fixed keyword - sql

Hi I currently have a tables with a column that I would like to split.
ID Serial
1 AAA"-A01-AU-234-U_xyz(CY)(REV-002)
2 AAA"-A01-AU-234-U(CY)(REV-1)
3 AAA"-A01-AU-234-U(CY)(REV-101)
4 VVV"-01-AU-234-Z_ww(REV-001)
5 VVV"-01-AU-234-Z(REV-001)_xyz(CY)
6 V-VV"-01-AU-234-Z(REV-03)_xyz(CY)
7 V-VV"-01-AU-234-Z-ZZZ(REV-004)_xyz(CY)
I would like to split this column into 2 field via a select statement
The first field would consist of the text from the start and end when this scenario is satisfied
After the first "-
take all text till the next 3 hypen (-)
Take the first letter after the last hypen(-)
The second field would want to store the Value(Int) inside the (REV) bracket. Rev is always stored inside a compassing bracket (Rev-xxx) the number may stretch from 0-999 and have different form of representation
Example of output
Field 1 Field 2
AAA"-A01-AU-234-U 2
AAA"-A01-AU-234-U 1
AAA"-A01-AU-234-U 101
VVV"-01-AU-234-Z 1
VVV"-01-AU-234-Z 1
V-VV"-01-AU-234-Z 3
V-VV"-01-AU-234-Z 4

Maybe it is possible to make it better and faster, but at least it does work. If i will have some time more i will look at this again to think of better solution, but it do the job.
create table #t
(
id int,
serial nvarchar(255)
)
go
insert into #t values (1, 'AAA"-A01-AU-234-U_xyz(CY)(REV-002)')
insert into #t values (2, 'AAA"-A01-AU-234-U(CY)(REV-1)')
insert into #t values (3, 'AAA"-A01-AU-234-U(CY)(REV-101)')
insert into #t values (4, 'VVV"-01-AU-234-Z_ww(REV-001)')
insert into #t values (5, 'VVV"-01-AU-234-Z(REV-001)_xyz(CY)')
insert into #t values (6, 'VVV"-01-AU-234-Z(REV-03)_xyz(CY)')
insert into #t values (7, 'VVV"-01-AU-234-Z(REV-004)_xyz(CY)')
go
select id, serial,
left(serial,charindex('-', serial, charindex('-', serial, charindex('-', serial, charindex('"',serial) + 2) +1) + 1) + 1) as 'Field2'
,cast( replace(left(right(serial, len(serial) - charindex('REV',serial) +1 ), CHARINDEX(')',right(serial, len(serial) - charindex('REV',serial) +1 )) - 1), 'REV-', '')as int) as 'Field1'
from #t
go
gives me:
id serial Field2 Field1
1 AAA"-A01-AU-234-U_xyz(CY)(REV-002) AAA"-A01-AU-234-U 2
2 AAA"-A01-AU-234-U(CY)(REV-1) AAA"-A01-AU-234-U 1
3 AAA"-A01-AU-234-U(CY)(REV-101) AAA"-A01-AU-234-U 101
4 VVV"-01-AU-234-Z_ww(REV-001) VVV"-01-AU-234-Z 1
5 VVV"-01-AU-234-Z(REV-001)_xyz(CY) VVV"-01-AU-234-Z 1
6 VVV"-01-AU-234-Z(REV-03)_xyz(CY) VVV"-01-AU-234-Z 3
7 VVV"-01-AU-234-Z(REV-004)_xyz(CY) VVV"-01-AU-234-Z 4

I came up with a solution in php using regular expressions.I am trying to convert it into posix standards supported by mysql.Anyways in the meanwhile you can have a look at this and it works perfect.
/The first script select the values for fields 1 namely AAA"-A01-AU-234-U/
<?php
$txt='VVV"-01-AU-234-Z(REV-001)_xyz(CY)';
$re1='((?:[a-z][a-z0-9_]*))';
$re2='.*?';
$re3='(\\d+)';
$re4='.*?';
$re5='((?:[a-z][a-z0-9_]*))';
$re6='.*?';
$re7='(\\d+)';
$re8='.*?';
$re9='([a-z])';
echo $re1.$re2.$re3.$re4.$re5.$re6.$re7.$re8.$re9;
if ($c=preg_match_all ("/".$re1.$re2.$re3.$re4.$re5.$re6.$re7.$re8.$re9."/is", $txt, $matches))
{
$var1=$matches[1][0];
$int1=$matches[2][0];
$var2=$matches[3][0];
$int2=$matches[4][0];
$w1=$matches[5][0];
print "($var1) ($int1) ($var2) ($int2) ($w1) \n";
}
?>
/*The second script selects values for field 2 namely the last integer*/
<?php
$txt='VVV"-01-AU-234-Z_ww(REV-001)';
$re1='.*?';
$re2='\\d';
$re3='.*?';
$re4='\\d';
$re5='.*?';
$re6='\\d';
$re7='.*?';
$re8='\\d';
$re9='.*?';
$re10='\\d';
$re11='.*?';
$re12='\\d';
$re13='.*?';
$re14='\\d';
$re15='(\\d)';
if ($c=preg_match_all ("/".$re1.$re2.$re3.$re4.$re5.$re6.$re7.$re8.$re9.$re10.$re11.$re12.$re13.$re14.$re15."/is", $txt, $matches))
{
$d1=$matches[1][0];
print "($d1) \n";
}
?>
OUTPUT:
(VVV) (01) (AU) (234) (Z) //script 1
(1) //script 2
You can add database connection to the script and store the results in a new table.You can aslo iterate each row as input to the script and store corresponding results in the table.
Note:
The regular expression used for selecting field 1:
((?:[a-z][a-z0-9_]*)).*?(\d+).*?((?:[a-z][a-z0-9_]*)).*?(\d+).*?([a-z])
The regular expression used for selecting field 2:
.*?\d.*?\d.*?\d.*?\d.*?\d.*?\d.*?\d(\d)
If anybody can convert the above expressions to posix standards then the user can write a simple query like
select t.serial as field 1 from table t
where t.serial regexp 'converted exp' join
(select t1.serial as field 2 from table t1
where t1.serial regexp 'converted exp')q
on q.id=t.id;
I tried to convert it but the matching constraints were lost.You should actually change ?: to ^ and ? to [^>] and //d to [0-9] or digit.Hope it helps.

Try this solution. It uses a combination of charindex and the substring function.
DECLARE #TempTable table
(
id int,
serial nvarchar(255)
)
insert into #TempTable values (1, 'AAA"-A01-AU-234-U_xyz(CY)(REV-002)')
insert into #TempTable values (2, 'AAA"-A01-AU-234-U(CY)(REV-1)')
insert into #TempTable values (3, 'AAA"-A01-AU-234-U(CY)(REV-101)')
insert into #TempTable values (4, 'VVV"-01-AU-234-Z_ww(REV-001)')
insert into #TempTable values (5, 'VVV"-01-AU-234-Z(REV-001)_xyz(CY)')
insert into #TempTable values (6, 'VVV"-01-AU-234-Z(REV-03)_xyz(CY)')
insert into #TempTable values (7, 'VVV"-01-AU-234-Z(REV-004)_xyz(CY)')
select
id,
serial,
substring(serial, 1, P4.Pos+1) as field1,
convert(int, substring(Serial, P6.Pos , P7.Pos - P6.Pos)) as field2
from #TempTable
cross apply (select (charindex('-', Serial))) as P1(Pos)
cross apply (select (charindex('-', Serial, P1.Pos+1))) as P2(Pos)
cross apply (select (charindex('-', Serial, P2.Pos+1))) as P3(Pos)
cross apply (select (charindex('-', Serial, P3.Pos+1))) as P4(Pos)
cross apply (select (charindex('REV-', Serial,P1.Pos+1)+4)) as P6(Pos)
--+4 because 'REV-' is 4 chars long
cross apply (select (charindex(')', Serial,P6.Pos+1))) as P7(Pos);

I have updated my answer. Is this better now?
DECLARE #Table table(ID int, SERIAL nvarchar(100));
INSERT INTO #Table(ID, SERIAL)
VALUES ('1', 'AAA"-A01-AU-234-U_xyz(CY)(REV-002)'),
('2', 'AAA"-A01-AU-234-U(CY)(REV-1)'),
('3', 'AAA"-A01-AU-234-U(CY)(REV-101)'),
('4', 'VVV"-01-AU-234-Z_ww(REV-001)'),
('5', 'VVV"-01-AU-234-Z(REV-001)_xyz(CY)'),
('6', 'VVV"-01-AU-234-Z(REV-03)_xyz(CY)'),
('7', 'VVV"-01-AU-234-Z(REV-004)_xyz(CY)'),
('8', 'AAA"-A01-AU-234-U-1111-(REV-111)'),
('9', 'AAA"-A01-AU-234-U-111111-5555(CY)(REV-101)'),
('10', 'V-VV"-01-AU-234-Z-ZZZ(REV-004)_xyz(CY)')
SELECT
ID,
SERIAL,
LEFT(SERIAL, P5.Pos + 1) AS Field1,
CONVERT(int, SUBSTRING(SERIAL, P6.Pos, CHARINDEX(')', RIGHT(SERIAL, LEN(SERIAL) - P6.Pos)))) AS Field2
FROM #Table
CROSS APPLY (SELECT CHARINDEX('"-', SERIAL)) AS P1(Pos)
CROSS APPLY (SELECT CHARINDEX('-', SERIAL, P1.Pos + 1)) AS P2(Pos)
CROSS APPLY (SELECT CHARINDEX('-', SERIAL, P2.Pos + 1)) AS P3(Pos)
CROSS APPLY (SELECT CHARINDEX('-', SERIAL, P3.Pos + 1)) AS P4(Pos)
CROSS APPLY (SELECT CHARINDEX('-', SERIAL, P4.Pos + 1)) AS P5(Pos)
CROSS APPLY (SELECT CHARINDEX('REV-', SERIAL, P5.Pos + 1) + 4) AS P6(Pos)

Related

Find data by multiple Lookup table clauses

declare #Character table (id int, [name] varchar(12));
insert into #Character (id, [name])
values
(1, 'tom'),
(2, 'jerry'),
(3, 'dog');
declare #NameToCharacter table (id int, nameId int, characterId int);
insert into #NameToCharacter (id, nameId, characterId)
values
(1, 1, 1),
(2, 1, 3),
(3, 1, 2),
(4, 2, 1);
The Name Table has more than just 1,2,3 and the list to parse on is dynamic
NameTable
id | name
----------
1 foo
2 bar
3 steak
CharacterTable
id | name
---------
1 tom
2 jerry
3 dog
NameToCharacterTable
id | nameId | characterId
1 1 1
2 1 3
3 1 2
4 2 1
I am looking for a query that will return a character that has two names. For example
With the above data only "tom" will be returned.
SELECT *
FROM nameToCharacterTable
WHERE nameId in (1,2)
The in clause will return every row that has a 1 or a 3. I want to only return the rows that have both a 1 and a 3.
I am stumped I have tried everything I know and do not want to resort to dynamic SQL. Any help would be great
The 1,3 in this example will be a dynamic list of integers. for example it could be 1,3,4,5,.....
Filter out a count of how many times the Character appears in the CharacterToName table matching the list you are providing (which I have assumed you can convert into a table variable or temp table) e.g.
declare #Character table (id int, [name] varchar(12));
insert into #Character (id, [name])
values
(1, 'tom'),
(2, 'jerry'),
(3, 'dog');
declare #NameToCharacter table (id int, nameId int, characterId int);
insert into #NameToCharacter (id, nameId, characterId)
values
(1, 1, 1),
(2, 1, 3),
(3, 1, 2),
(4, 2, 1);
declare #RequiredNames table (nameId int);
insert into #RequiredNames (nameId)
values
(1),
(2);
select *
from #Character C
where (
select count(*)
from #NameToCharacter NC
where NC.characterId = c.id
and NC.nameId in (select nameId from #RequiredNames)
) = 2;
Returns:
id
name
1
tom
Note: Providing DDL+DML as shown here makes it much easier for people to assist you.
This is classic Relational Division With Remainder.
There are a number of different solutions. #DaleK has given you an excellent one: inner-join everything, then check that each set has the right amount. This is normally the fastest solution.
If you want to ensure it works with a dynamic amount of rows, just change the last line to
) = (SELECT COUNT(*) FROM #RequiredNames);
Two other common solutions exist.
Left-join and check that all rows were joined
SELECT *
FROM #Character c
WHERE EXISTS (SELECT 1
FROM #RequiredNames rn
LEFT JOIN #NameToCharacter nc ON nc.nameId = rn.nameId AND nc.characterId = c.id
HAVING COUNT(*) = COUNT(nc.nameId) -- all rows are joined
);
Double anti-join, in other words: there are no "required" that are "not in the set"
SELECT *
FROM #Character c
WHERE NOT EXISTS (SELECT 1
FROM #RequiredNames rn
WHERE NOT EXISTS (SELECT 1
FROM #NameToCharacter nc
WHERE nc.nameId = rn.nameId AND nc.characterId = c.id
)
);
A variation on the one from the other answer uses a windowed aggregate instead of a subquery. I don't think this is performant, but it may have uses in certain cases.
SELECT *
FROM #Character c
WHERE EXISTS (SELECT 1
FROM (
SELECT *, COUNT(*) OVER () AS cnt
FROM #RequiredNames
) rn
JOIN #NameToCharacter nc ON nc.nameId = rn.nameId AND nc.characterId = c.id
HAVING COUNT(*) = MIN(rn.cnt)
);
db<>fiddle

Find rows that contain same value inside comma separated values

I have a varchar column, populated by another process where I have no control over, that is filled with comma separated values.
Now I need to find all rows where part of this column exists in that same column, in another row
example
declare #table table (value varchar(50))
insert into #table values ('NB,BD,FR'), ('BD,GK'), ('SL,SR')
select * from #table
so the table contains
value
-----
NB,BD,FR
BD,GK
SL,SR
from the example above I would like to get
value
-----
NB,BD,FR
BD,GK
Because there is a value (in this case BD but can be anything) present in both rows
Can this be done in sql?
You could use clunky XML manipulation to convert comma separated values to rows:
DECLARE #table TABLE (value VARCHAR(50));
INSERT INTO #table VALUES
('NB,BD,FR'),
('BD,GK'),
('SL,SR');
WITH cte AS (
SELECT value, node.value('.', 'varchar(10)') AS substr
FROM #table
CROSS APPLY (SELECT CAST('<x>' + REPLACE(value, ',', '</x>,<x>') + '</x>' AS XML)) AS x(doc)
CROSS APPLY doc.nodes('/x') AS n(node)
)
-- use your favorite technique to find the duplicate
SELECT value
FROM cte AS m
WHERE EXISTS (
SELECT 1
FROM cte AS x
WHERE value <> m.value AND substr = m.substr
)
The CAST(... AS XML) part assumes that your data does not contain characters that have special meaning in XML. The nodes method will convert one row to many, rest is straight forward.
This is the wrong data structure. Don't store values in strings!
declare #table table (id int, value varchar(50));
insert into #table
values (1, 'NB'), (1, 'BD'), (1, 'FR'),
(2, 'BD'), (2, 'GK'),
(3, 'SL'), (3, 'SR');
Then you can get what you want using window functions:
select id, value
from (select t.*, max(cnt) over (partition by id) as max_cnt
from (select t.*, count(*) over (partition by value) as cnt
from #table t
) t
) t
where max_cnt >= 2

SQL Coalesce with missing values

I have two tables, master and child. The master's primary key MM is an INT. The child table has a compound key of two columns and value column:
MM (INT)
POS (INT, values 1-32)
VV (INT, values 1-9)
Sample master table data:
(1, other data)
(2, other data)
(3, other data)
Sample child table data
(1, 1,2)
(1, 2,2)
(1, 4,1)
(1,15,1)
(2, 4,5)
(2, 5,3)
(2,31,7)
(3,3,1)
(4,18,2)
{4,19,5)
For a report I could like to de-normalize the data with an output like this:
(1,'22010000000000010000000000000000')
(2,'00053000000000000000000000000070')
(3,'00100000000000000000000000000000')
(4,'00000000000000000025000000000000')
I was thinking to use a select query with coalesce like this but the output is not not exactly what I want:
(1,'22110')
(2,'537')
(3,'1')
(4,'25')
How do I fill in the missing data with zeros?
One way I can think to do this uses a decimal value with a precision of 32 and sum() and then convert back to a zero-padded string:
select mm,
right(replicate('0', 32) + cast(sum(val) as varchar(32)), 32)
from (select c.*,
cast(cast(val as varchar(32)) + replicate('0', 32 - pos) as decimal(32, 0)) as val
from child c
) c
group by mm;
EDIT:
The above isn't generalizable (say, above 38 characters or to use letters as well as digits). Here is a more generalizable, but longer version:
select c.mm,
(max(case when pos = 1 then valc else '0' end) +
max(case when pos = 2 then valc else '0' end) +
max(case when pos = 3 then valc else '0' end) +
. . .
max(case when pos = 32 then valc else '0' end) +
)
from (select c.*, cast(val as varchar(255)) as valc
from child c
) c
group by c.mm;
I should note that if you want to handle a master with no children, then use a left join. That aspect of the problem seems less interesting than combining the values in the appropriate positions.
Try it like this
DECLARE #master TABLE(MM INT,OtherData VARCHAR(100));
INSERT INTO #master VALUES
(1, 'Other Data 1')
,(2, 'Other Data 2')
,(3, 'Other Data 3');
DECLARE #child TABLE(MM INT, POS INT, VV INT)
INSERT INTO #child VALUES
(1, 1,2)
,(1, 2,2)
,(1, 4,1)
,(1,15,1)
,(2, 4,5)
,(2, 5,3)
,(2,31,7)
,(3,3,1)
,(4,18,2)
,(4,19,5);
--One CTE to get 32 numbers
WITH Numbers(Nr) AS
(SELECT TOP 32 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM sys.objects) --get 32 numbers
--another CTE to get distinct MMs
,MMs AS
(
SELECT c.MM
,m.OtherData
FROM #child AS c
LEFT JOIN #master AS m ON c.MM=m.MM
GROUP BY c.MM,m.OtherData
)
--In this CTE The CROSS JOIN with the Numbers will create a list of 32 rows, which carry in all positions with a corresponding child its number. COALESCE will set a zero in the place of all NULLs
,Masked AS
(
SELECT MMs.MM
,MMs.OtherData
,Nr
,COALESCE(VV,0) AS Val
FROM MMs
CROSS JOIN Numbers
LEFT JOIN #child AS c1 ON c1.MM=MMs.MM AND c1.POS=Nr
)
-The final SELECT uses FOR XML PATH to get the 32 numbers in rows back to a string
SELECT *
,(
SELECT Masked.Val AS [*]
FROM Masked
WHERE Masked.MM=MMs.MM
FOR XML PATH('')
)
FROM MMs
The result
1 22010000000000100000000000000000
2 00053000000000000000000000000070
3 00100000000000000000000000000000
4 00000000000000000250000000000000

Select records with order of IN clause

I have
SELECT * FROM Table1 WHERE Col1 IN(4,2,6)
I want to select and return the records with the specified order which i indicate in the IN clause
(first display record with Col1=4, Col1=2, ...)
I can use
SELECT * FROM Table1 WHERE Col1 = 4
UNION ALL
SELECT * FROM Table1 WHERE Col1 = 6 , .....
but I don't want to use that, cause I want to use it as a stored procedure and not auto generated.
I know it's a bit late but the best way would be
SELECT *
FROM Table1
WHERE Col1 IN( 4, 2, 6 )
ORDER BY CHARINDEX(CAST(Col1 AS VARCHAR), '4,2,67')
Or
SELECT CHARINDEX(CAST(Col1 AS VARCHAR), '4,2,67')s_order,
*
FROM Table1
WHERE Col1 IN( 4, 2, 6 )
ORDER BY s_order
You have a couple of options. Simplest may be to put the IN parameters (they are parameters, right) in a separate table in the order you receive them, and ORDER BY that table.
The solution is along this line:
SELECT * FROM Table1
WHERE Col1 IN(4,2,6)
ORDER BY
CASE Col1
WHEN 4 THEN 1
WHEN 2 THEN 2
WHEN 6 THEN 3
END
select top 0 0 'in', 0 'order' into #i
insert into #i values(4,1)
insert into #i values(2,2)
insert into #i values(6,3)
select t.* from Table1 t inner join #i i on t.[in]=t.[col1] order by i.[order]
Replace the IN values with a table, including a column for sort order to used in the query (and be sure to expose the sort order to the calling application):
WITH OtherTable (Col1, sort_seq)
AS
(
SELECT Col1, sort_seq
FROM (
VALUES (4, 1),
(2, 2),
(6, 3)
) AS OtherTable (Col1, sort_seq)
)
SELECT T1.Col1, O1.sort_seq
FROM Table1 AS T1
INNER JOIN OtherTable AS O1
ON T1.Col1 = O1.Col1
ORDER
BY sort_seq;
In your stored proc, rather than a CTE, split the values into table (a scratch base table, temp table, function that returns a table, etc) with the sort column populated as appropriate.
I have found another solution. It's similar to the answer from onedaywhen, but it's a little shorter.
SELECT sort.n, Table1.Col1
FROM (VALUES (4), (2), (6)) AS sort(n)
JOIN Table1
ON Table1.Col1 = sort.n
I am thinking about this problem two different ways because I can't decide if this is a programming problem or a data architecture problem. Check out the code below incorporating "famous" TV animals. Let's say that we are tracking dolphins, horses, bears, dogs and orangutans. We want to return only the horses, bears, and dogs in our query and we want bears to sort ahead of horses to sort ahead of dogs. I have a personal preference to look at this as an architecture problem, but can wrap my head around looking at it as a programming problem. Let me know if you have questions.
CREATE TABLE #AnimalType (
AnimalTypeId INT NOT NULL PRIMARY KEY
, AnimalType VARCHAR(50) NOT NULL
, SortOrder INT NOT NULL)
INSERT INTO #AnimalType VALUES (1,'Dolphin',5)
INSERT INTO #AnimalType VALUES (2,'Horse',2)
INSERT INTO #AnimalType VALUES (3,'Bear',1)
INSERT INTO #AnimalType VALUES (4,'Dog',4)
INSERT INTO #AnimalType VALUES (5,'Orangutan',3)
CREATE TABLE #Actor (
ActorId INT NOT NULL PRIMARY KEY
, ActorName VARCHAR(50) NOT NULL
, AnimalTypeId INT NOT NULL)
INSERT INTO #Actor VALUES (1,'Benji',4)
INSERT INTO #Actor VALUES (2,'Lassie',4)
INSERT INTO #Actor VALUES (3,'Rin Tin Tin',4)
INSERT INTO #Actor VALUES (4,'Gentle Ben',3)
INSERT INTO #Actor VALUES (5,'Trigger',2)
INSERT INTO #Actor VALUES (6,'Flipper',1)
INSERT INTO #Actor VALUES (7,'CJ',5)
INSERT INTO #Actor VALUES (8,'Mr. Ed',2)
INSERT INTO #Actor VALUES (9,'Tiger',4)
/* If you believe this is a programming problem then this code works */
SELECT *
FROM #Actor a
WHERE a.AnimalTypeId IN (2,3,4)
ORDER BY case when a.AnimalTypeId = 3 then 1
when a.AnimalTypeId = 2 then 2
when a.AnimalTypeId = 4 then 3 end
/* If you believe that this is a data architecture problem then this code works */
SELECT *
FROM #Actor a
JOIN #AnimalType at ON a.AnimalTypeId = at.AnimalTypeId
WHERE a.AnimalTypeId IN (2,3,4)
ORDER BY at.SortOrder
DROP TABLE #Actor
DROP TABLE #AnimalType
ORDER BY CHARINDEX(','+convert(varchar,status)+',' ,
',rejected,active,submitted,approved,')
Just put a comma before and after a string in which you are finding the substring index or you can say that second parameter.
And first parameter of CHARINDEX is also surrounded by , (comma).

Test the sequentiality of a column with a single SQL query

I have a table that contains sets of sequential datasets, like that:
ID set_ID some_column n
1 'set-1' 'aaaaaaaaaa' 1
2 'set-1' 'bbbbbbbbbb' 2
3 'set-1' 'cccccccccc' 3
4 'set-2' 'dddddddddd' 1
5 'set-2' 'eeeeeeeeee' 2
6 'set-3' 'ffffffffff' 2
7 'set-3' 'gggggggggg' 1
At the end of a transaction that makes several types of modifications to those rows, I would like to ensure that within a single set, all the values of "n" are still sequential (rollback otherwise). They do not need to be in the same order according to the PK, just sequential, like 1-2-3 or 3-1-2, but not like 1-3-4 or 1-2-3-3-4.
Due to the fact that there might be thousands of rows within a single set I would prefer to do it in the db to avoid the overhead of fetching the data just for verification after making some small changes.
Also there is the issue of concurrency. The way locking in InnoDB (repeatable read) works (as I understand) is that if I have an index on "n" then InnoDB also locks the "gaps" between values. If I combine set_ID and n to a single index, would that eliminate the problem of phantom rows appearing?
Looks to me like a common problem. Any brilliant ideas?
Thanks!
Note: using MySQL + InnoDB
Look for sequences where max - min + 1 > count:
IF EXISTS (SELECT set_ID
FROM mytable
GROUP BY set_ID
HAVING MAX(n) - MIN(n) + 1 > COUNT(n)
)
ROLLBACK
If the sequence must start at 1, do this instead:
IF EXISTS (SELECT set_ID
FROM mytable
GROUP BY set_ID
HAVING MIN(n) = 1 AND MAX(n) > COUNT(n)
)
ROLLBACK
You also need to avoid duplicate sequence numbers. But this can be done by creating a unique key on set_ID and n.
Try this:
IF EXISTS (SELECT set_ID
FROM mytable
GROUP BY set_ID
HAVING MIN(n) = 1 AND MAX(n) <> COUNT(DISTINCT n)
)
ROLLBACK
works on SQL Server (I don't have MySql to try it out):
DECLARE #YourTable table (ID int, set_ID char(5), some_column char(10),n int)
INSERT #YourTable VALUES (1, 'set-1' ,'aaaaaaaaaa' ,1)
INSERT #YourTable VALUES (2, 'set-1' ,'bbbbbbbbbb' ,2)
INSERT #YourTable VALUES (3, 'set-1' ,'cccccccccc' ,3)
INSERT #YourTable VALUES (4, 'set-2' ,'dddddddddd' ,1)
INSERT #YourTable VALUES (5, 'set-2' ,'eeeeeeeeee' ,2)
INSERT #YourTable VALUES (6, 'set-3' ,'ffffffffff' ,2)
INSERT #YourTable VALUES (7, 'set-3' ,'gggggggggg' ,1)
INSERT #YourTable VALUES (8, 'set-3' ,'ffffffffff' ,4)
INSERT #YourTable VALUES (9, 'set-3' ,'ffffffffff' ,4)
--this will list all "bad" sets
SELECT set_ID
FROM #YourTable
GROUP BY set_ID
HAVING MIN(n) = 1 AND MAX(n) <> COUNT(DISTINCT n)
OUTPUT:
set_ID
------
set-3