sql FInding strings with duplicate characters - sql

I have a list of strings:
HEAWAMFWSP
TLHHHAFWSP
AWAMFWHHAW
AUAWAMHHHA
Each of these strings represent 5 pairs of 2 character combinations (i.e. HE AW AM FW SP)
What I am looking to do in SQL is to display all strings that have duplication in the pairs.
Take string number 3 from above; AW AM FW HH AW. I need to display this record because it has a duplicate pair (AW).
Is this possible?
Thanks!

Given current requirements, yes this is dooable. Here's a version which uses a recursive CTE (text may need to be adjusted for vendor idiosyncracies), written and tested on DB2. Please note that this will return multiple rows if there is more than 2 instances of a pair in a string, or more than 1 set of duplicates.
WITH RECURSIVE Pair (rowid, start, pair, text) as (
SELECT id, 1, SUBSTR(text, 1, 2), text
FROM SourceTable
UNION ALL
SELECT rowid, start + 2, SUBSTR(text, start + 2, 2), text
FROM Pair
WHERE start < LENGTH(text) - 1)
SELECT Pair.rowid, Pair.pair, Pair.start, Duplicate.start, Pair.text
FROM Pair
JOIN Pair as Duplicate
ON Duplicate.rowid = Pair.rowid
AND Duplicate.pair = Pair.pair
AND Duplicate.start > Pair.start

Here's a not very elegant solution, but it works and only returns the row once no matter how many duplicate matches. The substring function is for SQLServer, not sure what it is for Oracle.
select ID, Value
from MyTable
where (substring(Value,1,2) = substring(Value,3,4)
or substring(Value,1,2) = substring(Value,5,6)
or substring(Value,1,2) = substring(Value,7,8)
or substring(Value,1,2) = substring(Value,9,10)
or substring(Value,3,4) = substring(Value,5,6)
or substring(Value,3,4) = substring(Value,7,8)
or substring(Value,3,4) = substring(Value,9,10)
or substring(Value,5,6) = substring(Value,7,8)
or substring(Value,5,6) = substring(Value,9,10)
or substring(Value,7,8) = substring(Value,9,10))

Related

How can I count repeated values in the string in BigQuery?

Example:
I have the following string:
201904,BLANK,201902,BLANK,BLANK,201811,201810,201809
How can I count the number of repeated values "BLANK" that goes one by one?
In the described example the answer is 2, but what is the query?
Thanks for your help in advance!
Below is for BigQuery Standard SQL (with quick simplified example)
Corrected Version
#standardSQL
WITH `project.dataset.table` AS (
SELECT '201904,BLANK,201902,BLANK,BLANK,201811,201810,201809,BLANK,BLANK,BLANK' value UNION ALL
SELECT '201904,BLANK,201902,BLANK,BLANK,BLANK,201811' UNION ALL
SELECT '201904,BLANK,201902,BLANK,201811,201902,BLANK,201811'
)
SELECT value,
(
SELECT MAX(ARRAY_LENGTH(SPLIT(list))) - 1
FROM UNNEST(REGEXP_EXTRACT_ALL(value || ',', r'(?:BLANK,){1,}')) list
) max_repeated_count
FROM `project.dataset.table`
The idea here is
extract all instances of consecutive BLANK
split each such instances to array of elements of BLANK
and finally get max length of those arrays as a result
Just something came as quick approach
Refactored Version
#standardSQL
WITH `project.dataset.table` AS (
SELECT '201904,BLANK,201902,BLANK,BLANK,201811,201810,201809,BLANK,BLANK,BLANK' value UNION ALL
SELECT '201904,BLANK,201902,BLANK,BLANK,BLANK,201811' UNION ALL
SELECT '201904,BLANK,201902,BLANK,201811,201902,BLANK,201811'
)
SELECT value,
(
SELECT MAX(LENGTH(element) - 1)
FROM UNNEST(REGEXP_EXTRACT_ALL(REPLACE(value || ',', 'BLANK', ''), r',+')) element
) max_repeated_count
FROM `project.dataset.table`
Both with output
Row value max_repeated_count
1 201904,BLANK,201902,BLANK,BLANK,201811,201810,201809,BLANK,BLANK,BLANK 3
2 201904,BLANK,201902,BLANK,BLANK,BLANK,201811 3
3 201904,BLANK,201902,BLANK,201811,201902,BLANK,201811 1
Refactored version is slightly different (but main idea the same)
it removes all BLANKS (assuming BLANK cannot be part of other element - if it can - code can easily be adjusted)
then extract all consecutive entries of commas into array
calculates max length of such sequences of commas
Maybe I misunderstood, but can't you simply split by the value you're looking for and subtract 2 (1 for the first element and 1 for counting elements after splitting):
declare t DEFAULT '201904,BLANK,201902,BLANK,BLANK,201811,201810,201809';
SELECT
t as theString,
split(t,'BLANK') as theSplittedString,
array_length(split(t,'BLANK'))-2 as theAmount
n>0 - amount of repetition,
0 - no repetition,
-1 - element not found

SQL QUERY to bring last letter in a string to first letter position using SQL Server

I have a column called Supervisor from a table JobData in a SQL Server database. In this Supervisor column the records are of the format below.
DANNYL
ADITYAG
SAMMYS
BOBBYJ
I want to convert these records to lower case and bring the last letter to first letter. For example, DANNYL should be changed to the format ldanny and this format should be applied to all the remaining records.
Can anyone help me out with a SQL query for this?
You can use the following solution using LEFT and RIGHT to get the parts of the name. By using LOWER you can convert the upper case characters to lower case:
SELECT LOWER(RIGHT(Supervisor, 1) + LEFT(Supervisor, LEN(Supervisor) - 1))
FROM JobData
WHERE LTRIM(RTRIM(Supervisor)) <> ''
-- or using ABS on the length - 1 so the WHERE isn't needed.
SELECT LOWER(RIGHT(Supervisor, 1) + LEFT(Supervisor, ABS(LEN(Supervisor) - 1)))
FROM JobData
Since it looks like the column Supervisor contains empty values you can also use the following solution without calculation and not failing on the empty values:
SELECT LOWER(RIGHT(Supervisor, 1) + REVERSE(SUBSTRING(REVERSE(Supervisor), 2, LEN(Supervisor))))
FROM JobData
... and another possibility using STUFF:
SELECT LOWER(LEFT(STUFF(Supervisor, 1, 0, RIGHT(Supervisor, 1)), LEN(Supervisor)))
FROM JobData
demo on dbfiddle.uk
there is probably a better way do to that , but here is my proposition.
SELECT lower(left(right('DANYL',1)+'DANYL',len('DANYL')))
Using SUBSTRING you can get the expected result:
SELECT LOWER(CONCAT(SUBSTRING(Supervisor, LEN(Supervisor), 1), SUBSTRING(Supervisor, 0, LEN(Supervisor))))
FROM JobData
Demo with the given sample data:
DECLARE #JobData TABLE (Supervisor VARCHAR(100));
INSERT INTO #JobData (Supervisor) VALUES
('DANNYL'), ('ADITYAG'), ('SAMMYS'), ('BOBBYJ');
SELECT LOWER(CONCAT(SUBSTRING(Supervisor, LEN(Supervisor), 1), SUBSTRING(Supervisor, 0, LEN(Supervisor)))) AS Supervisor
FROM #JobData
Output:
ldanny
gaditya
ssammy
jbobby
Like this? :
SELECT
LOWER(CONCAT(SUBSTRING([Supervisor], LEN([Supervisor]), 1),SUBSTRING([Supervisor], 1, ABS(LEN([Supervisor])-1))))
FROM TABLE

Generate sequential number in SQL - not by using Identity

I am working on a task where my query will produce a fixed width column. One of the fields in the fixed width column needs to be a sequentially generated number.
Below is my query:
select
_row_ord = 40,
_cid = t.client_num,
_segment = 'ABC',
_value =
concat(
'ABC*',
'XX**', --Hierarchical ID number-this field should be sequentially generated
'20*',
'1*','~'
)
from #temp1 t
My output:
Is there a way to declare #num as a parameter that generates number sequentially?
PS: The fields inside the CONCAT function is all hardcoded. Only the 'XX' i.e., the sequential number has to be dynamically generated
Any help?!
You could create a SEQUENCE object, then call the NEXT VALUE FOR the SEQUENCE in your query.
Something along these lines:
CREATE SEQUENCE dbo.ExportValues
START WITH 1
INCREMENT BY 1 ;
GO
And then:
select
_row_ord = 40,
_cid = t.client_num,
_segment = 'ABC',
_value =
concat(
'ABC*',
RIGHT(CONCAT('000000000000000', NEXT VALUE FOR dbo.ExportValues),15)
'**',
'20*',
'1*','~'
)
from #temp1 t
You'd have to tweak how many zeros there are for the padding and how many digits to trim it to for your requirements. If duplicate values are ok, you could have the SEQUENCE reset periodically. See the documentation for more on that. It's just another line in the CREATE statement.
You can use row_number() -- made a little more complicated because you are zero-padding it:
select _row_ord = 40, _cid = t.client_num, _segment = 'ABC',
_value = concat('ABC*',
right('00' + convert(varchar(255), row_number() over (order by ?)), 2),
'XX**', --Hierarchical ID number-this field should be sequentially generated
'20*',
'1*','~'
)
from #temp1 t;
Note that the ? is for the column that specifies the ordering. If you don't care about the ordering of the numbers, use (select null) in place of the ?.

SQL Plus show first four and last 4 numbers in the output

How can I select a columns first 4 digits and last 4 digits and use an "X" placement for all the numbers in between?
Example
SELECT id from users where user_name ='Tom';
Output
5958694850384567
I am trying to get only the first and last 4 numbers with x's as placements to any number that is being masked:
Trying to get it to look like
Output:5958XXXXXXXX4567
Here is my query so far:
SELECT SUBSTR(id, 1, 4) from users
where user_name ='Tom'
Thank you for your time!
Have you considered simply use the LEFT() and RIGHT() functions? These will give you a specific number of characters from the left or right of a given string respectively.
You can also combine those to build your complete string using the CONCAT() function:
SELECT CONCAT(LEFT(id, 4), 'XXXXXXXX', RIGHT(id, 4))
FROM users
WHERE user_name = 'Tom'
Additionally, if you don't always have a given number of characters within the string, you could calculate the middle section of your output as well via the REPLICATE() function and a bit of math:
SELECT CONCAT(LEFT(id, 4), REPLICATE('X', LEN(id) - 8), RIGHT(id, 4))
FROM users
WHERE user_name = 'Tom'
Oracle Version
I didn't realize that you were using Oracle specifically and assumed SQL Server, so I'll provide some similar code to handle this in that flavor:
SELECT LEFT(id, 4) || RPAD('X', LEN(id) - 8, 'X') || RIGHT(id, 4)
FROM users
WHERE user_name = 'Tom'

Is this simple UPDATE SQL an error waiting to happen? How to rewrite it?

I need to examine ACCT_NUMS values om TABLE_1. If the ACCT_NUM is prefixed by "GF0", then I need to disregard the "GF0" prefix and take the rightmost 7 characters of the remaining string. If this resulting value is not found in account_x_master or CW_CLIENT_STAGE, then, the record is to be flagged as an error.
The following seems to do the trick, but I have a concern...
UPDATE
table_1
SET
Error_Ind = 'GW001'
WHERE
LEFT(ACCT_NUM, 3) = 'GF0'
AND RIGHT(SUBSTRING(ACCT_NUM, 4, LEN(ACCT_NUM) - 3), 7) NOT IN
(
SELECT
acct_num
FROM
account_x_master
)
AND RIGHT(SUBSTRING(ACCT_NUM, 4, LEN(ACCT_NUM) - 3), 7) NOT IN
(
SELECT
CW_CLIENT_STAGE.AGS_NUM
FROM
dbo.CW_CLIENT_STAGE
)
My concern is that SQL Server may attempt to perform a SUBSTRING operation
SUBSTRING(ACCT_NUM, 4, LEN(ACCT_NUM) - 3)
that results in a computed negative value and causing the SQL to fail. Of course, this wouldn't fail is the SUBSTRING operation were only applied to those records that we at least 3 characters long, which would always be the case if the
LEFT(ACCT_NUM, 3) = 'GF0'
were applied first. If possible, I'd like to avoid adding new columns to the table. Bonus points for simplicity and less overhead :-)
How can I rewrite this UPDATE SQL to protect against this?
As other people said, your concern is valid.
I'd make two changes to your query.
1) To avoid having negative value in the SUBSTRING parameter we can rewrite it using STUFF:
SUBSTRING(ACCT_NUM, 4, LEN(ACCT_NUM) - 3)
is equivalent to:
STUFF(ACCT_NUM, 1, 3, '')
Instead of extracting a tail of a string we replace first three characters with empty string. If the string is shorter than 3 characters, result is empty string.
By the way, if your ACCT_NUM may end with space(s), they will be trimmed by the SUBSTRING version, because LEN doesn't count trailing spaces.
2) Instead of
LEFT(ACCT_NUM, 3) = 'GF0'
use:
ACCT_NUM LIKE 'GF0%'
If you have an index on ACCT_NUM and only relatively small number of rows start with GF0, then index will be used. If you use a function, such as LEFT, index can't be used.
So, the final query becomes:
UPDATE
table_1
SET
Error_Ind = 'GW001'
WHERE
ACCT_NUM LIKE 'GF0%'
AND RIGHT(STUFF(ACCT_NUM, 1, 3, ''), 7) NOT IN
(
SELECT
acct_num
FROM
account_x_master
)
AND RIGHT(STUFF(ACCT_NUM, 1, 3, ''), 7) NOT IN
(
SELECT
CW_CLIENT_STAGE.AGS_NUM
FROM
dbo.CW_CLIENT_STAGE
)
You have a very valid concern, because SQL Server will rearrange the order of evaluation of expressions in the WHERE.
The only way to guarantee the order of operations in a SQL statement is to use case. I don't think there is a way to catch failing calls to substring() . . . there is no try_substring() analogous to try_convert().
So:
WHERE
LEFT(ACCT_NUM, 3) = 'GF0' AND
(CASE WHEN LEN(ACCT_NUM) > 3 THEN RIGHT(SUBSTRING(ACCT_NUM, 4, LEN(ACCT_NUM) - 3), 7) END) NOT IN (SELECT acct_num
FROM account_x_master
) AND
(CASE WHEN LEN(ACCT_NUM) > 3 THEN RIGHT(SUBSTRING(ACCT_NUM, 4, LEN(ACCT_NUM) - 3), 7) END) NOT IN (SELECT CW_CLIENT_STAGE.AGS_NUM
FROM dbo.CW_CLIENT_STAGE
)
This is uglier. And, there may be ways around it, say by using LIKE with wildcards rather than string manipulation. But, the case will guarantee that the SUBSTRING() is only run on strings long enough so no error is generated.
Please try the below query.
Since there is no short circuit and or in SQL WHERE clause, only way to achieve is via CASE syntax.
I noticed that you had two NOT IN comparisons in different parts of WHERE which I combined into one.
Note that CASE condition is >=3 and not >3, as RIGHT('',x) is allowed.
Also note the proper use of CASE with NOT IN
UPDATE table_1
SET
Error_Ind = 'GW001'
select * from table_1
WHERE
LEFT(ACCT_NUM, 3) = 'GF0'
AND CASE
WHEN LEN(ACCT_NUM)>=3
THEN RIGHT(SUBSTRING(ACCT_NUM, 4, LEN(ACCT_NUM) - 3), 7)
ELSE NULL END NOT IN
(
SELECT acct_num as num
FROM account_x_master
UNION
SELECT CW_CLIENT_STAGE.AGS_NUM as num
FROM dbo.CW_CLIENT_STAGE
)