Alternative to like/ like any?... regexp? - sql

I have a TERADATA dataset that resembles the below:
Customer_ID | Targeting_Region
12 | targ=EU, targ=!Eu.Fr
34 | targ=Asia
56 | targ=!EU
The '!' denotes 'does not equal'. For example, the customer in Row #1 wants to target the EU, but exclude France.
I want to create a field that flags (with a '1') any row where there is 'positive' targeting. By 'positive' targeting I am referring to any row where a specific region as been explicitly INCLUDED ('negative' targeting would be where a region is explicity EXCLUDED, such as the exclusion of France in Row #1). For example, Row #1 contains both positive and negative targeting, Row #2 contains only positive targeting, Row #3 contains only negative targeting.
The problem I am encountering is that a simple case statement won't work (as far as I can tell). I have tried the 2 statements below:
(case when targeting_region like '%targ=%'; then 1 else 0 end) as target_flag
(case when ((targeting_region like '%targ=%';) and (targeting_region not like '%targ=!%';)) then 1 else 0 end) as target_flag
The 1st statement above doesn't work because it will return 1 for both 'targ=' and 'targ=!
The 2nd statement above doesn't work because it will return 1 for rows that ONLY have positive targeting. As such, Row #1 (above) would return a 0 (I want it to return a 1)
Note that that value following 'targ=' could also be a number. E.g., 'targ=12345'
Any ideas on how I could accomplish this? I have heard that teradata has something called regexp but I have been unable to find a good explanation of it after quite a bit of searching.
Thanks!

Maybe not exactly what you're looking for, but if you want a 1 only when there is a positive target and no negative target, then why not make it 0 if there exists a negative target and 1 otherwise?
For example,
case when targeting_region like '%targ=!%' then 0
when targeting_region like '%targ=%' then 1
else null -- Optional if you want to handle when no targeting regions exist
end as target_flag

Would something like this work?
(case when REGEXP_INSTR(targeting_region,'targ=[A-Z,a-z]') = 0; then 0 else 1)
I found syntax and example of REGEXP_INSTR() at
http://www.info.teradata.com/HTMLPubs/DB_TTU_14_00/index.html#page/SQL_Reference/B035_1145_111A/Regular_Expr_Functions.085.03.html#ww14955402
Because there was too little info at this site, you will have to fiddle with it to get it to work.
For example...
The equal sign in "...targ=...", and maybe even the left and right brackets, may need to be escaped, perhaps with backslash. Also, the above assumes that if there is no match, the function returns 0 (rather than NULL). It may need to be changed from "=0" to "IS NULL". Also, I assume that the parameters after the first two are optional. You may need to specify them, e.g., "1,1,i". Also, the expression could be simplified a bit, for example by using a shortcut for [A-Z,a-z], if you can find better documentation.
Explanation:
The second parameter specifies a "pattern" to look for in the first parameter.
1. "targ=" looks for exactly those characters.
2. "[A-Z,a-z]" looks for an alphabetic character. If a "!" occurs, it will not match and the search will proceed with the rest of the string.
3. REGEXP_SUBSTR() returns the character position where the pattern was found in the string. That's overkill because you only want to know yes it was found or no it was not, but hopefully it works because I couldn't find a simpler function.

If I understood you correctly you want 1 if there's any included target regardless of additional excluded regions?
This searches for 'targ=' followed by any other character than '!':
CASE WHEN REGEXP_INSTR(Targeting_Region,'targ=[^!]') = 0 THEN 0 ELSE 1 END
If your release doesn't include REGEXP function there might be OREPLACE:
CASE WHEN POSITION('targ=' IN OREPLACE (Targeting_Region, 'targ=!', '')) > 0 THEN 1 ELSE 0 END

Related

SQL computed column based on a special character on another column

I am not good with SQL at all, barely have an idea on how to do basic scripts suck as delete, drop, add.
I have this data with about 12 columns, I want to add a calculated column which will change depending if a special character shows up in another column.
lets say
A C
Money$ YES
Money NO
that is the idea, I want to create a column C where it says yes if there is a $ sign on the column A. Is this possible? I am assuming you can use something similar to an if condition but I have no experience with SQL scripting.
You would use a case expression and like:
select t.*,
(case when a like '%$%' then 'YES' else 'NO' end) as c
from t;
The following is just commentary.
This is very basic syntax for SQL. I would recommend that you spend some time to learn the basics. Learning-as-you-go is an okay approach -- assuming you have some fundamentals to build on. Otherwise, you are likely to spend a lot of time to learn a few things, and you may not learn the best way to do things.
yes, this is possible. you'll have to replace the parts in braces ({}) with the appropriate object names. I also use a bit rather than 'Yes'/'No'; as that seems better suited:
ALTER TABLE {YourTable} ADD {New Column Name} AS CONVERT(bit, CASE WHEN {Column} LIKE '%$%' THEN 1 ELSE 0 END) PERSISTED;
Note that this will return 0 if the column ({Column}) has a value of NULL, not NULL; unsure if this is the correct logic however, this should be more than enough to get the ball rolling. If not, read up on the CASE expression and NULL logic.
Regexp match can help you find out if there is a character you consider as special char in the strings:
SELECT
ColumnA
, SUBSTRING(ColumnA, PATINDEX('%[^ a-zA-Z0-9]%', ColumnA), 1) AS FirstSpecialChar
WHERE
ColumnA LIKE '%[^ a-zA-Z0-9]%'
;
The pattern [^ a-zA-Z0-9] will match on any character which is not a number, a space or an alphabetic character (note the ^ at the beginning of the character group - that mean NOT)
You can use regex to check any special character in column EX:
SQL SERVER
SELECT CASE WHEN 'ABCD$' Like '%[^a-zA-Z0-9]%' 1 THEN 'YES' ELSE 'NO' END as result
MYSQL
SELECT CASE WHEN 'ABCD$' REGEXP '[^a-zA-Z0-9]' = 1 THEN 'YES' ELSE 'NO' END as result
Regex can be changed as per the requirement
REGEXP '[^[:alnum:]]'

PostgreSQL - Assign integer value to string in case statement

I need to select one and only 1 row of data based on an ID in the data I have. I thought I had solved this (For details, see my original question and my solution, here: PostgreSQL - Select only 1 row for each ID)
However, I now still get multiple values in some cases. If there is only "N/A" and 1 other value, then no problem.. but if I have multiple values like: "N/A", "value1" and "value2" for example, then my case statement is not sufficient and I get both "value1" and "value2" returned to me. This is the case statement in question:
CASE
WHEN "PQ"."Value" = 'N/A' THEN 1
ELSE 0
END
I need to give a unique integer value to each string value and then the problem will be solved. The question is: how do I do this? My first thought is to somehow convert the character values to ASCII and sum them up.. but I am not sure how to do that and also worried about performance. Is there a way to very simply assign a value to each string so that I can choose 1 value only? I don't care which one actually... just that it's only 1.
EDIT
I am now trying to create a function to add up the ASCII values of each character so I can essentially change my case statement to something like this:
CASE
WHEN "PQ"."Value" = 'N/A' THEN 9999999
ELSE SumASCII("PQ"."Value")
END
Having a small problem with it though.. I have added it as a separate question, here: PostgreSQL - ERROR: query has no destination for result data
EDIT 2
Thanks to #Bohemian, I now have a working solution, which is as follows:
CASE
WHEN "PQ"."Value" = 'N/A' THEN -1
ELSE ('x'||LPAD(MD5("PQ"."Value"),16,'0'))::bit(64)::bigint
END DESC
This will produce a "unique" number for each value:
('x'||substr(md5("PQ"."Value"),1,8))::bit(64)::bigint
Strictly speaking, there is a chance of a collision, but it's very remote.
If the result is "too big", you could try modulus:
<above-calculation> % 10000
Although collisions would then be a 0.01% chance, you should try this formula against all known values to ensure there are no collisions.
If you don't care which value gets picked, change RANK() to ROW_NUMBER(). If you do care, do it anyway, but also add another term after the CASE statement in ORDER BY, separated by a comma, with the logic you want - for example if you want the first value alphabetically, do this:
...
ORDER BY CASE...END, "PQ"."Value")
...

Returning postcodes (varchars) with only one numeric character in them

I've been asked to run a query to return a list of UK post codes from a table full of filters for email reports which only have 1 number at the end. The problem is that UK post codes are of variable length; some are structured 'AA#' or 'AA##' and some are structured 'A#' or 'A##'. I only want those that are either 'AA#' or 'A#'.
I tried running the below SQL, using length and (attempting to) use regex to filter out all results which didn't match what I wanted, but I'm very new to using ranges and it hasn't worked.
SELECT PostCode
FROM ReportFilterTable RFT
WHERE RFT.FilterType = 'Postcode'
AND LEN(RFT.Postcode) < 4
AND RFT.PostCode LIKE '%[0-9]'
I think the way I'm approaching this is flawed, but I'm clueless as to a better way. Could anyone help me out?
Thanks!
EDIT:
Since I helpfully didn't include any example data originally, I've now done so below.
This is a sample of the kind of values in the column I'm returning, with examples of what I need to return and what I don't.
B1 -- Should be returned
B10 -- Should not be returned
B2 -- Should be returned
B20 -- Should not be returned
B3 -- Should be returned
B30 -- Should not be returned
SE1 -- Should be returned
SE10 -- Should not be returned
You could filter for one or two letters (and omit the length check, since it's implicit in the LIKE):
WHERE RFT.FilterType = 'Postcode' AND
(RFT.PostCode LIKE '[A-Z][0-9]' OR RFT.PostCode LIKE '[A-Z][A-Z][0-9]')
If the issue is that you are getting values with multiple digits and you are using SQL Server (as suggested by the syntax), then you can do:
WHERE RFT.FilterType = 'Postcode' AND
LEN(RFT.Postcode) < 4 AND
(RFT.PostCode LIKE '%[0-9]' AND RFT.PostCode NOT LIKE '%[0-9][0-9]')
Or, if you know there are at least two characters, you could use:
WHERE RFT.FilterType = 'Postcode' AND
LEN(RFT.Postcode) < 4 AND
RFT.PostCode LIKE '%[^0-9][0-9]'
Non-digit followed by 1 digit ... LIKE '%[^0-9][0-9]'

IS NULL doesn't work

I have a table like the one above with the two left columns (Both of them are integer) and I added to this table two more fields:
Table1:
Asset_Value Contract_Value
-------------------------------
0 NULL
NULL 200
0 NULL
And the query:
Select
Asset_Value, Contract_Value,
Case
when Asset_Value is null
then 1
else 0
end As Ind_ForNullA,
Case
when Contract_Value is null
then 1
else 0
end As IndForNullC
from
table1
However, I get strange results:
Asset_Value Contract_Value Ind_ForNullA IndForNullC
----------------------------------------------------
0 NULL 1 1
NULL 200 0 0
0 NULL 0 1
Update : Never Mind. Damm comma has been forgetten.
Try with ' ' empty string instead of null.
Try to use the function ASCII
ASCII ( character_expression )
Using this function you can understand what is the really character in the column Asset_Value: for example NULL value must have ASCII code: '00' Table with character ASCII code
Why are you bothering to define something as NULL--which is appropriate--but, in this case or that one, changing it to one or zero (an attempt at Boolean, I imagine)--which is inappropriate?
What will you do next: decide that, in a third report, you want to have a REAL boolean that displays TRUE or FALSE?
Go with NULL consistently, throughout the entire application, everywhere, or don't. However, DO NOT MIX AND MATCH PARADIGMS and expect consistent results. Also, try not to rely upon CASE: there's very, very legitimate reason to depend upon that--especially for something as simplistic as what you're doing.
FYI, the only reason I can conceive why you would want this "indicator" field is so that you can test whether INDICATOR_A = 1. However, since you can test ASSET IS NULL, why even bother? NEVER introduce extraneous mechanisms when there is no overpowering reason to do so.

Netsuite sql case statement where both are true

I am trying to create a formula for the following criteria.
WHEN usernotes.notetitle CONTAINS ‘collection’ AND usernotes.notedate is within the last thirty days
Here is what I have right now. It lets me write a case then set the value.
CASE WHEN ({usernotes.notetitle} CONTAINS 'collection') AND (TRUNC({today}-{usernotes.notedate}) BETWEEN 0 AND 30) THEN 1 ELSE 0 END
I don’t know if CONTAINS is the right syntax and I’m not sure if I can combine the two formulas or if I did it right.
I believe you are combining your conditions correctly with AND.
To do string comparisons, use LIKE and the % wildcard:
CASE WHEN ({usernotes.notetitle} LIKE '%collection%') AND (TRUNC({today}-{usernotes.notedate}) BETWEEN 0 AND 30) THEN 1 ELSE 0 END
If you only need to look for notes that start with collection, then just remove the first %.
See the NetSuite Help article titled SQL Expressions for details on all the SQL functions you can use.
See this page on LIKE for more details about pattern matching.