Deleting records with number repeating more than 5 - sql

I have data in a table of length 9 where data is like
999999969
000000089
666666689
I want to delete only those data in which any number from 1-9 is repeating more than 5 times.

OK, so the logic here can be summed up as:
Find the longest series of the same consecutive digit in any given number; and
Return true if that longest value is > 5 digits
Right?
So, lets split it into series of consecutive digits:
regress=> SELECT regexp_matches('666666689', '(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)', 'g');
regexp_matches
----------------
{6666666}
{8}
{9}
(3 rows)
then filter for the longest:
regress=>
SELECT x[1]
FROM regexp_matches('6666666898', '(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)', 'g') x
ORDER BY length(x[1]) DESC
LIMIT 1;
x
---------
6666666
(1 row)
... but really, we don't actually care about that, just if any entry is longer than 5 digits, so:
SELECT x[1]
FROM regexp_matches('6666666898', '(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)', 'g') x
WHERE length(x[1]) > 5;
can be used as an EXISTS test, e.g.
WITH blah(n) AS (VALUES('999999969'),('000000089'),('666666689'),('15552555'))
SELECT n
FROM blah
WHERE EXISTS (
SELECT x[1]
FROM regexp_matches(n, '(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)', 'g') x
WHERE length(x[1]) > 5
)
which is actually pretty efficient and return the correct result (always nice). But it can be simplified a little more with:
WITH blah(n) AS (VALUES('999999969'),('000000089'),('666666689'),('15552555'))
SELECT n
FROM blah
WHERE EXISTS (
SELECT x[1]
FROM regexp_matches(n, '(0{6}|1{6}|2{6}|3{6}|4{6}|5{6}|6{6}|7{6}|8{6}|9{6})', 'g') x;
)
You can use the same WHERE clause in a DELETE.

This can be much simpler with a regular expression using a back reference.
DELETE FROM tbl
WHERE col ~ '([1-9])\1{5}';
That's all.
Explain
([1-9]) ... a character class with digits from 1 to 9, parenthesized for the following back reference.
\1 ... back reference to first (and only in this case) parenthesized subexpression.
{5} .. exactly (another) 5 times, making it "more than 5".
Per documentation:
A back reference (\n) matches the same string matched by the previous
parenthesized subexpression specified by the number n [...] For example, ([bc])\1 matches bb or cc but not bc or cb.
SQL Fiddle demo.

Horrible and terrible in terms of performance, but it should work:
DELETE FROM YOURTABLE
WHERE YOURDATA LIKE '%111111%'
OR YOURDATA LIKE '%222222%'
OR YOURDATA LIKE '%333333%'
OR YOURDATA LIKE '%444444%'
OR YOURDATA LIKE '%555555%'
OR YOURDATA LIKE '%666666%'
OR YOURDATA LIKE '%777777%'
OR YOURDATA LIKE '%888888%'
OR YOURDATA LIKE '%999999%'

Related

NOT IN is not working as expected with Listagg function

Below is the DDL of the table
create or replace table tempdw.blk_table;
(
db_name varchar,
tbl_expr varchar
);
insert into tempdw.blk_table values ('edw','ABC%');
insert into tempdw.blk_table values ('edw','EFG%');
select * from tempdw.blk_table;
Below code is not working, expected output should not return any
select * from tempdw.blk_table where tbl_expr not in (
select regexp_replace(regexp_replace(replace(listagg(tbl_expr,','),',','\',\''),'^','\''),'$','\'') from tempdw.blk_table);
When I run below code it works fine , Trying to understand why it's not working for above code
select * from tempdw.blk_table where tbl_expr NOT IN('ABC%','EFG%');
Au contraire The code is working just fine. You don't understand the difference between a string that has commas and a list of strings.
Unfortunately, it is rather hard to figure out what you do want to do, because your question does not explain that.
I can speculate that you want something like:
select bt.*
from blk_table bt
where db_name like tbl_expr;
This is just a guess, however.
with data as (
select * from values ('edw','ABC%'),('edw','ABC%') v(db_name,tbl_expr )
)
select * from data
where tbl_expr not in (
select regexp_replace(regexp_replace(replace(listagg(tbl_expr,','),',','\',\''),'^','\''),'$','\'') from data);
does indeed give the results you don't want. aka:
DB_NAME TBL_EXPR
edw ABC%
edw ABC%
because your sub-query only has one row of results, because you have aggregated the two input into one row.
REGEXP_REPLACE( REGEXP_REPLACE( REPLACE( LISTAGG( TBL_EXPR,','),',','\',\''),'^','\''),'$','\'')
'ABC%','ABC%'
and NOT IN is a exact match .. thus if we change from strings to numbers:
SELECT num, num in (2,3,4) FROM values (1),(3),(5) v(num);
gives:
NUM NUM IN (2,3,4)
1 0
3 1
5 0
so your NOT IN would only return strings that are not in the list of one you have... and given your list is the aggregate of the same input, that are by definition not that same.
back to strings..
SELECT str
,str in ('str_a', 'str_b')
,str not in ('str_a', 'str_b')
from values ('a'),('str_b') v(str);
gives:
STR STR IN ('STR_A', 'STR_B') STR NOT IN ('STR_A', 'STR_B')
a 0 1
str_b 1 0
Thus the results you are getting..
now I suspect you are want LIKE type behavior OR a REGEX match, but given you are building the list you know what you are doing there..
also note:
listagg(tbl_expr,',') AS a
,replace(a,',','\',\'') AS b
,regexp_replace(b,'^','\'') AS c
,regexp_replace(c,'$','\'') AS d
is the effect of what you are doing can be replaced with
listagg('\'' || tbl_expr || '\'',',')
unless you want strings with embedded comma to become independent "list" items..

How to replace characters at specific position in several words using REGEX_REPLACE

I have a query similar to this:
SELECT YEAR_CODE FROM YEAR_CODES
and it returns several records: typically 1 but sometimes 2 or 3. The returned records look like this: 2018FOO, 2019BAR
I need to get the matching previous year of the returned codes. For instance:
2018FOO becomes 2017FOO
2019BAR becomes 2018BAR
Looking for something similar to:
REGEX_REPLACE(SELECT YEAR_CODE FROM YEAR_CODES, 4th character, 4th character minus 1)
You don't need regexp_replace(), using substr() string operator with concat() function (or concatenation operators ||) is enough :
with year_codes(year_code) as
(
select '2018FOO' from dual union all
select '2019BAR' from dual
)
select concat(substr(year_code,1,4) - 1,substr(year_code,-3)) as year_code
from year_codes;
YEAR_CODE
---------
2017FOO
2018BAR
to_number() conversion is redundant, since Oracle implicitly considers a string as a number which is completely composed of digits for an arithmetic operation.
You can do use string operations:
with c as (
<your query here>
)
select
from year_code yc
where to_number(substr(yc.code, 1, 4)) = to_number(substr(c.code)) - 1 and
substr(yc.code, 5) = substr(c.code, 5)

to_number from char sql

I have to select only the IDs which have only even digits (an ID looks like: p19 ,p20 etc). That is, p20 is good (both 2 and 0 are even digits); p18 is not.
I thought to use substr to get each number from the IDs and then see if it's even .
select from profs
where to_number(substr(id_prof,2,2))%2=0 and to_number(substr(id_prof,3,2))%2=0;
IF you need all rows consist of 'p' in beginning and even digits on tail It should look like:
select *
from profs
where regexp_like (id_prof, '^p[24680]+$');
with
profs ( prof_id ) as (
select 'p18' from dual union all
select 'p24' from dual union all
select 'p53' from dual
)
-- End of test data; what is above this line is NOT part of the solution.
-- The solution (SQL query) begins here.
select *
from profs
where length(prof_id) = length(translate(prof_id, '013579', '0'));
PROF_ID
-------
p24
This solution should work faster than anything using regular expressions. All it does is to replace 0 with itself and DELETE all odd digits from the input string. (The '0' is included due to a strange but documented behavior of translate() - the third argument can't be empty). If the length of the input string doesn't change after the translation, that means the input string didn't have any odd digits.
where mod(to_number(regexp_replace(id_prof, '[^[:digit:]]', '')),2) = 0

Retrivieng specific occurrences of a given Regex with Oracle SQL

In a simplified form, I'm attempting to retrieve either the first occurrence of the '.*?=(.*?);.*' regex, or the second, or the third -- that is, either x or y or z (that is, I want to be able to hardcode in this query that I want the first, second or third values) in this following example:
select regexp_replace(
'margin=x;margin=y;margin=z;',
'.*?=(.*?);.*',
'\1',
1 -- occurrences. I thought that picking 1, 2 or 3 would solve my problem?
) from dual;
-- This returns "xyz", which is terrible. I was expecting it to return "x", in this case.
Looking at the Oracle documentation, I thought this would be relatively straightforward, as the last parameter (occurrences), apparently allows me to select which groups to take into consideration. But it doesn't! Why?
Thanks
i´m goingoff to another completly different solution. Would combining a hierarchial substring select with a regexp_replace be an option for your needs?
This way you could create an option to either select one or multiple values, depending on your needs. You wouldn´t need to write a concatinating regex value and you could adjust the select a bit more to your needs
select regexp_replace(subselect.val, '.*=(.*?);', '\1') -- remove "margin="
from (select regexp_substr(
'margin=x;margin=y;margin=z;',
'.*?=(.*?);',
1,
level) val,
level lvl
from dual
connect by regexp_substr('margin=x;margin=y;margin=z;',
'.*?=(.*?);',
1,
level) is not null) subselect -- This select represents each margin=T as a single row
where lvl = 1; -- cou could define multiple values to select aswell.
You need a regex that will match 1 to n occurrences of the whole group. E.g.
([^=]*=([^;]*);){2}.*
(replaced with \2 backreference) will get the 2nd attribute value. Your regex can also be used (though it is quite synonymous to the above pattern): (.*?=(.*?);){2}.*.
See the regex demo
If you define the index variable as IDX, you can use something like
select regexp_replace(
'margin=x;margin=y;margin=z;',
CONCAT('([^=]*=([^;]*);){', IDX, '}.*'),
'\2'
) from dual;
NOTE: If you want to get an empty string as a result of trying to obtain a non-existing value, add |.* at the end of the regex:
(.*?=(.*?);){4}.*|.*
See this regex demo (with your input string, the result will be empty string).
Perhaps all you need is this.... The fourth parameter is NOT the occurrence but the POSITION from which the search starts. The FIFTH parameter is the occurrence.
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions130.htm
Also, are you sure you want REPLACE and not SUBSTR?
EDITED: To clarify (it seems at least one person was confused). I show a possible solution to what you need (perhaps) at the end, but first let's look at REGEXP_REPLACE. I rewrote your query to use different occurrences; I put the index in a CTE, but you can instead make idx into a bind variable, or any other mechanism you need to use. As you will see, the output makes no sense.
with t1 ( idx ) as (select 1 from dual union all select 2 from dual
union all select 3 from dual)
select idx,
regexp_replace('margin=x;margin=y;margin=z;', '.*?=(.*?);.*', '\1', 1, idx) as val
from t1;
Output:
IDX VAL
---------- -----------------------
1 xmargin=y;margin=z;
2 margin=x;ymargin=z;
3 margin=x;margin=y;z
3 rows selected.
I guess this is not what you needed - but it demonstrates what was wrong in your query. The fourth argument to REGEXP_REPLACE, 1 in all cases in the above query, is the position from which the search begins. The fifth argument, idx, is the occurrence. This query replaces the first, second, third occurrence with the subexpression - probably not what you wanted.
If you need to extract x, or y, or z, depending on the occurrence number, you must use REGEXP_SUBSTR, not REGEXP_REPLACE. Note also that I changed the match pattern - the .*? at the beginning and the .* at the end are unnecessary. If you want to find x, y or z in something like margin=x; but not in length=x; then you must make that explicit, the match pattern should be 'margin=(.*?);'.
with t1 ( idx ) as (select 1 from dual union all select 2 from dual
union all select 3 from dual)
select idx,
regexp_replace('margin=x;margin=y;margin=z;', '=(.*?);', '\1', 1, idx) as val
from t1;
Output:
IDX VAL
---------- -------
1 x
2 y
3 z

How to use special characters in SQL Server LIKE clause

I have a table in which I want to get the strings which are like ab aabb aaabbb ...... a n times followed by b n times as shown below.
Eg TABLE:
value
----------
ab
aabb
aaabbb
aaaabbbb
1
1a
abababa
I want the result TABLE to be:
value
----------
ab
aabb
aaabbb
aaaabbbb
I've tried like this
select * from [NumTest] where value LIKE '[a]+[b]+'
but it's returning zero rows.
Can anybody help me how to use special characters in SQL Server's LIKE ?
Here is something that can work:
(EDIT - after O/P comment, commented parts not needed)
--WITH CTE_GoodValues AS
--(
SELECT value
FROM Table1
WHERE LEFT(VALUE,LEN(VALUE)/2) = REPLICATE('a',LEN(VALUE)/2)
AND RIGHT(VALUE,LEN(VALUE)/2) = REPLICATE('b',LEN(VALUE)/2)
AND LEN(VALUE)%2=0
--)
--SELECT REPLICATE(' ', (SELECT MAX(LEN(VALUE))/2 FROM CTE_GoodValues)- LEN(VALUE)/2) + VALUE
--FROM CTE_GoodValues
In the CTE - select values that have left half all a-s and right half all b-s. Then find MAX length and use it to replicate needed empty spaces in front
DEMO (after edit)