exit_reason
sr_inefficient_management
tech_too_complex
company_member_resignation
sr_product_engagement
sr_contractual_reasons
sr_contractual_reasons-expectation_issues
sr_churn-takeover_business
I would like to split the column if the value contains the string "sr_" and keep the rest as it is. If the column contains "-" such as "sr_contractual_reasons-expectation_issues", I only want to keep it as "contractual reasons".
So far, my idea is to use
case when exit_reason like '%inefficient_management%' then 'inefficient management'
but if there are many different values, I am in trouble.
Expected output
exit_reason column
tech too complex
company member resignation
product engagement
contractual reasons
contractual reasons
churn
You can just replace 'sr_'
replace(exit_reason, 'sr_', '')
It is unlikely that 'sr_' would appear in any of the reasons. But you can use regexp_replace() to be sure:
regexp_replace(exit_reason, '^sr_', '')
You can try something like it:
REPLACE(
CASE
WHEN exit_reason LIKE '%-%'
THEN split_part(exit_reason,'-',2)
WHEN exit_reason LIKE 'sr_%'
THEN split_part(exit_reason,'sr_',2)
ELSE exit_reason
END
, '_', ' '
)
This code first checks if 'exist_reason' has a hyphen, then if it has 'sr_' and replaces all underscores with blanks.
To also remove the suffix, you could use:
SELECT replace(
regexp_replace(
'sr_contractual_reasons-expectation_issues',
'^(sr_)?([^-]*).*$',
'\2'
),
'_',
' '
);
replace
═════════════════════
contractual reasons
(1 row)
The regular expression matches an optional leading sr_, then all characters until the first -, then anything that follows that, and keeps only the middle part. replace then replaces underscores with spaces.
Related
Community,
I need assistance with removing the UNDER SCORES '_' and make the name readable first name letter UpperCase last name UpperCase, while removing the number as well. Hope this makes sense. I am running Presto and using Query Fabric. I there a better way to write this syntax?
Email Address
Full_Metal_Jacket#movie.com
TOP_GUN2#movie.email.com
Needed Outcome
Full Metal Jacket
Top Gun
Partical working Resolution:
,REPLACE(SPLIT_PART(T.EMAIL, '#', 1),'_',' ') Name
Something like this:
,LOWER(REPLACE(UPPER(SPLIT_PART(T.EMAIL, '#', 1)),'_',' '))Name
Try this:
WITH t(email) AS (
VALUES 'Full_Metal_Jacket#movie.com', 'TOP_GUN2#movie.email.com'
)
SELECT array_join(
transform(
split(regexp_extract(email, '(^[^0-9#]+)', 1), '_'),
part -> upper(substr(part, 1, 1)) || lower(substr(part, 2))),
' ')
FROM t;
How it works:
extract the non-numeric prefix up to the # using a regex via regexp_extract
split the prefix on _ to produce an array
transform the array by capitalizing the first letter of each element and lowercasing the rest.
Finally, join them all together with a space using the array_join function.
Update:
Here's another variant without involving transform and the intermediate array:
regexp_replace(
replace(regexp_extract(email, '(^[^0-9#]+)', 1), '_', ' '),
'(\w)(\w*)',
x -> upper(x[1]) || lower(x[2]))
Like the approach above, it first extracts the non-numeric prefix, then it replaces underscores with spaces with the replace function, and finally, it uses regexp_replace to process each word. The (\w)(\w*) regular expression captures the first letter of the word and the rest of the word into two separate capture groups. The x -> upper(x[1]) || lower(x[2]) lambda expression then capitalizes the first letter (first capture group -- x[1]) and lower cases the rest (second capture group -- x[2]).
I have a SQL query which returns some rows having the below format:
DB_host
DB_host_instance
How can i filter to get rows which only have the format of 'DB_host' (place a condition to return values with only one occurrence of '_')
i tried using [0-9a-zA-Z_0-9a-zA-Z], but seems like its not right. Please suggest.
One option would be using REGEXP_COUNT and at most one underscore is needed then use
WHERE REGEXP_COUNT( col, '_' ) <= 1
or strictly one underscore should exist then use
WHERE REGEXP_COUNT( col, '_' ) = 1
A simple method is a regular expression:
where regexp_like(col, '^[^_]+_[^_]+$')
This matches the full string when there is a string with no underscores followed by an underscore followed by another string with no underscores.
You could also do this with LIKE, but it is more complicated:
where col like '%\_%' and col not like '%\_%\_%'
That is, has one underscore but not two. The \ is needed because _ is a wildcard for LIKE patterns.
You can suppress underscores in the string, and ensure that the length of the result is just one character less than the original:
where len(replace(col, '_', '')) = len(col) - 1
I wonder how this method would compare to a regex or two likes in terms of efficiency on a large dataset. I would not be surprised it it was more efficient.
I need to replace multiple characters in a string. The result can't contain any '&' or any commas.
I currently have:
REPLACE(T2.[ShipToCode],'&','and')
Which converts & to and, but how do you put multiple values in one line?
You just need to daisy-chain them:
REPLACE(REPLACE(T2.[ShipToCode], '&', 'and'), ',', '')
One comment mentions "dozens of replace calls"... if removing dozens of single characters, you could also use Translate and a single Replace.
REPLACE(TRANSLATE(T2.[ShipToCode], '[];'',$#', '#######'), '#', '')
We used a function to do something similar that looped through the string, though this was mostly to remove characters that were not in the "#ValidCharacters" string. That was useful for removing anything that we didn't want - usually non-alphanumeric characters, though I think we also had space, quote, single quote and a handful of others in that string. It was really used to remove the non-printing characters that tended to sneak in at times so may not be perfect for your case, but may give you some ideas.
CREATE FUNCTION [dbo].[ufn_RemoveInvalidCharacters]
(#str VARCHAR(8000), #ValidCharacters VARCHAR(8000))
RETURNS VARCHAR(8000)
BEGIN
WHILE PATINDEX('%[^' + #ValidCharacters + ']%',#str) > 0
SET #str=REPLACE(#str, SUBSTRING(#str ,PATINDEX('%[^' + #ValidCharacters +
']%',#str), 1) ,'')
RETURN #str
END
If you need fine control, it helps to indent-format the REPLACE() nesting for readability.
SELECT Title,
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(RTRIM(Title),
' & ',''),
'++', ''),
'/', '-'),
'(',''),
')',''),
'.',''),
',',''),
' ', '-')
AS Title_SEO
FROM TitleTable
If you use SQL Server 2017 or 2019 you can use the TRANSLATE function.
TRANSLATE(ShipToCode, '|+,-', '____')
In this example de pipe, plus, comma en minus are all replaced by an underscore.
You can change every character with its own one.
So in the next example the plus and minus are replaced by a hash.
TRANSLATE(ShipToCode, '|+,-', '_#_#')
Just make sure the number of characters is the same in both groups.
Hope this might helps to anyone
If you want to replace multiple words or characters from a string with a blank string (i.e. wanted to remove characters), use regexp_replace() instead of multiple replace() clauses.
SELECT REGEXP_REPLACE("Hello world!123SQL$##$", "[^\w+ ]", "")
The above query will return Hello world123SQL
The same process will be applied if you want to remove multiple words from the string.
If you want to remove Hello and World from the string Hello World SQL, then you can use this query.
SELECT REGEXP_REPLACE("Hello World SQL", "(Hello|World)", "")
This will return SQL
With this process, the query will not look redundant and you didn't have to take care of multiple replace() clauses.
Conclusion
If you wanted to replace the words with blank string, go with REGEXP_REPLACE().
If you want to replace the words with other words, for example replacing & with and then use replace(). If there are multiple words to be replaced, use multiple nested replace().
I don't recall ever seeing a field like this before, but it combines the city, state, and zipcode into a single string of varchar2. Fortunately, I believe most of the fields are in the same city space state, space zipcode format, but I started finding a few that deviated from that norm.
Right now I'm trying to identify all these distinct conditions
in the database with over 5 million rows and my queries aren't working for what I wanted.
I started with:
SELECT PROJECT_CTY_ST_ZIP FROM PAYMENT WHERE PROJECT_CTY_ST_ZIP LIKE '%' || CHR(32) || '%';
Then tried:
SELECT PROJECT_CTY_ST_ZIP FROM PAYMENT WHERE PROJECT_CTY_ST_ZIP LIKE '% %' AND PROJECT_CTY_ST_ZIP LIKE '% %' AND PROJECT_CTY_ST_ZIP LIKE '% %';
but they are both pulling based on leading and trailing spaces and I was really wanted to find spaces in the inside of the text. I don't want to remove them, just identify them with a query so I can parse them properly in my java code and then do an insert later to put them into city, state, and zipcode fields in another table.
While it doesn't show it here, I found this field in IA with no leading spaces, then one leading space and then two leading spaces. I fixed the leading spaces with trim.
WEST LIBERTY, IA 52776
This last one I wasn't expecting and I wanted to see if there are other conditions that might be unusual, but my query doesn't find them as the spaces are in the middle of the text:
TRUTH OR CONSEQUENCE, NM 87901
How would I go about a query to find these kinds of distinct records?
This query replaces each of the spaces with a dot (.) so you can see them
SELECT
REGEXP_REPLACE(PROJECT_CTY_ST_ZIP,
'([[:space:]])',
'.') spaces_or_now_dots
FROM PAYMENT
This query finds the ones that have one or more spaces.
SELECT PROJECT_CTY_ST_ZIP
FROM PAYMENT
where REGEXP_LIKE(PROJECT_CTY_ST_ZIP,
'[[:space:]]'
)
I have not considered the cases of spaces in the beginning and end, because you have already taken care of them.
I would like to ask for advice on records testing.
At this point, I have an account field that must consist of numbers only. Nevertheless, this is a varchar field because of the leading zeroes.
I had this query that actually shows me non-digits in the account number (or null).
LTRIM(TRANSLATE(ACCOUNT,'0123456789',' '),' ') INVALID_DATA
Nevertheless, I just faced another issue- spaces are not taken into account, and therefore if account has a space, it goes unnoticed as null. Yes, I can replace space with something more recognizable, but will it be enough? I am sure there are other exceptions I don't know about.
Is there any universal way to check for ANY POSSIBLE variation that is not a number? maybe something like this? How reliable is it?
LENGTH(ACCOUNT)-LENGTH(TO_NUMBER(REGEXP_REPLACE((ACCOUNT), '[^[:digit:]]+', ''))) NON-NUMBERS
Also, how to detect and account for cases with non-Unicode characters?
I would just use a regexp_replace approach to find non-numbers in a string:
regexp_replace(account, '\d+')
Explanation
-The escape character, \d, is metacharacter for a digit character.
-The + symbol is a quantifier indicating one or more instances of this digit.
Thus, we are removing all digits from the account column and where this is non-null,
you have non-number left.
~~~~~~~~~~~~
With respect to your calculation:
LENGTH(ACCOUNT)-LENGTH(TO_NUMBER(REGEXP_REPLACE((ACCOUNT), '[^[:digit:]]+', ''))) NON-NUMBERS
Your reg expression looks for non-digits and removes them. Your approach will work (I do not like converting the string to a number then dynamically casting as a string). If you use this in a condition, it would need to be encapsulated in a NVL function like this:
NVL(LENGTH(ACCOUNT)-LENGTH(REGEXP_REPLACE(ACCOUNT, '[^[:digit:]]+', '')),0) NON-NUMBERS
I think you have the right idea:
LTRIM(TRANSLATE(ACCOUNT, ' 0123456789', 'X'), ' ') as INVALID_DATA
Should return a non-empty string when there is a space.
In a where clause you would use:
where LENGTH( LTRIM(TRANSLATE(ACCOUNT, ' 0123456789', 'X'), ' ')) > 0
You can replace all spaces like this:
SELECT REPLACE(fld_or_variable, ' ', '')
Or you can trim spaces like this
SELECT LTRIM(RTRIM(' Amit Tech Corp '))