I've got a Varchar2 field which usually holds two alphabetic characters (such as ZH, SZ, AI,...). Let's call it FOO.
Certain datasets save A or A1 - A9 into the same field. I need to select all rows except exactly those.
I used the function substr to separate the number from the A. So far so good, < or > don't seem to work correctly with the "number-string".
How can I achieve this without converting it to a number? Is there an easier solution?
I haven't found anything on the internet and I reached my limit trying it myself.
This is my WHERE clause so far:
WHERE (substr(FOO, 0, 1) != 'A'
or (substr(FOO, 0, 1) = 'A' AND substr(FOO, 1, 1) > '9'));
It returns all the rows without restrictions.
The only solution I found:
WHERE (FOO NOT IN ('A', 'A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7', 'A8', 'A9'));
But this is not optimal if, somewhere in the future, there will be A1 - A50. I would have to add 51 strings to my WHERE clause. And, since the query is in source code, also the code readability would get worse.
The solution should work on ORACLE and SQL Server.
Thanks in advance
(substr(FOO, 0, 1) = (substr(FOO, 1, 1) - Oracle starts with 1 (not 0).
So you should use substr(FOO, 2, 1) to get the second symbol.
However, it won't work in SQL Server which has SUBSTRING (not SUBSTR).
if you're ready to use different approaches in the different DBs you can also try regular expressions:
Oracle
where not regexp_like(foo, '^A[1-9]{1,3}$')
^ begining of the string
$ end of the string
[1-9] any digit from 1 to 9
{1,3} repeat the previous expression 1,2 or 3 times
Examples of FOOs which match / not match '^A[1-9]{1,3}$'
a123 -- may match / may not (depending on NLS settings regarding case sensitivity)
A123 -- match (the first symbol is 'A', the others are 3 digits)
A123b -- doesn't match (the last symbol should be a digit)
A1234 -- doesn't match (there should be 1,2 or 3 digits an the end)
A12 -- match
A1 -- match
SQL Server
REGEXP_LIKE conversion in SQL Server T-SQL
If your requirement is to include all alphabetic values except 'A' alone, consider using a LIKE expression so that it will work with any ANSI-compliant DBMS:
WHERE FOO <> 'A' AND FOO NOT LIKE '%[^A-Z]%'
Related
I did not expect this to be a problem, but I'm struggling to return the first 3 numbers, including the 0's before them. In the below examples, I show a few things I've tried. I want it to return '001'. It either returns '118' or an error. It seems like every solution wants to convert them to a text, which will drop the 0's.
SELECT lpad(00118458582::text, 3, '0')
returns 118
SELECT lpad(00118458582, 3, '0')
ERROR: function lpad(integer, integer, unknown) does not exist
SELECT left(00118458582::text, 3)
returns 118
SELECT left(00118458582, 3)
ERROR: function left(integer, integer) does not exist
SELECT substring(00118458582::text, 1, 3)
returns 118
Can I get any help please? Thanks!
Your problem starts before you try to get the first 3 digits, namely that you're considering 00118458582 to be a valid INTEGER (or whatever numeric type). I mean, it's not invalid, but what happens when you run SELECT 00118458582::INTEGER? You get 118458582. Because leading zeros in those types are senseless. So you'll never have a situation as in your examples (outside of a hardcoded number with leading zeros in your query window) in your tables, because those zeros wouldn't be stored in your number-based data type fields.
So the only way to get that sort of situation is when they're string-based: SELECT '00118458582'::TEXT returns 00118458582. And at that point you can run your preferred function to get the first 3 characters, e.g. SELECT LEFT('00118458582', 3) which returns 001. But if you're planning on casting that to INTEGER or something, forget about leading zeros.
SELECT substring(00118458582::text, 1, 3)
returns 118 because it is a number 118458582 (the leading zeros are automatically dropped), that is converted to text '118458582' and it then takes the first 3 characters.
If you are trying to take the first three digits and then convert to a number you can use try:
select substring('00118458582', 1,3::numeric)
it might actually be:
select substring('00118458582', 1,3)::numeric
I don't have a way to test right now...
lpad() refers to the total length of the returned value. So I think you want:
select lpad(00118458582::text, 12, '0'::text)
If you always want exactly 3 zeros before, then just concatenate them:
select '000' || 00118458582::text
this is my first question here.
I am building an SQL query in which I need to verify that the version of the object B is always lower or equal than the version of the object A. This is a link table, here is an example :
The query is :
SELECT *
FROM TABLE
WHERE B_VERSION <= A_VERSION
As you can see, it works for the 2 first rows, but not the third, because AA0 is detected as smaller than H08 while it shouldn't (when we arrive at Z99 the next version number is AA0 so the <= operator doesn't work anymore).
So I would like to do something like to parse the version to compare on how many letters are they in the versions, and only if both versions have the same number of letters then I use the <= operator.
I don't know however how to do that in an SQL query. Didn't find anything usefull on google neither. Do you have a solution ?
Thanks in advance
The key for solving this problem is the function PATINDEX. You can find more information here.
This query takes the value of A_VERSION and finds the first occurrence of a number. Then uses this position to divide the value in two parts. The first part is padded to the right with spaces because it is alphabetic, while the second part is padded to the right with zeros ('0') because it is numeric.
The same process occurs for B_VERSION.
Noticed that in this example, each part is assumed to be of maximum 5 characters, so this will work in your case for versions ranging from A0 to ZZZZZ99999. Feel free to adjust as you need.
SELECT *
FROM TABLE
WHERE RIGHT(SPACE(5)
+ SUBSTRING(A_VERSION,
1,
PATINDEX('%[0-9]%', A_VERSION) - 1), 5)
+ RIGHT(REPLICATE('0', 5)
+ SUBSTRING(A_VERSION,
PATINDEX('%[0-9]%', A_VERSION),
LEN(A_VERSION)), 5)
<= RIGHT(SPACE(5)
+ SUBSTRING(B_VERSION,
1,
PATINDEX('%[0-9]%', B_VERSION) - 1), 5)
+ RIGHT(REPLICATE('0', 5)
+ SUBSTRING(B_VERSION,
PATINDEX('%[0-9]%', B_VERSION),
LEN(B_VERSION)) ,5)
If you are going to do this operation in many places, you might consider creating a function for this operation.
Hope this helps.
Many thanks! It helped a lot however I am using sql developer and I cannot use PATINDEX with this software, I found the equivalent which is REGEXP_INSTR, it works very similarly.
I used this alrogithm that filters out the lines where there are more letters in VERSION_B than VERSION_A and then filter out the lines where VERSION_B is bigger than VERSION_A when they have both the same quantity of letters:
WHERE
(REGEXP_INSTR(VERSION_B, '[0-9]') < REGEXP_INSTR(VERSION_A, '[0-9]')) OR
(REGEXP_INSTR(VERSION_B, '[0-9]') = REGEXP_INSTR(VERSION_A, '[0-9]') AND VERSION_B <= VERSION_A)
I have this query
SELECT text
FROM book
WHERE lyrics IS NULL
AND MOD(TO_NUMBER(SUBSTR(text,18,16)),5) = 1
sometimes the string is something like this $OK$OK$OK$OK$OK$OK$OK, sometimes something like #P,351811040302663;E,101;D,07112018134733,07012018144712;G,4908611,50930248,207,990;M,79379;S,0;IO,3,0,0
if I would like to know if it is possible to prevent ORA-01722: invalid number, because is some causes the char in that position is not a number.
I run this query inside a procedure a process all the rows in a cursor, if 1 row is not a number I can't process any row
You could use VALIDATE_CONVERSION if it's Oracle 12c Release 2 (12.2),
WITH book(text) AS
(SELECT '#P,351811040302663;E,101;D,07112018134733,07012018144712;G,4908611,50930248,207,990;M,79379;S,0;IO,3,0,0'
FROM DUAL
UNION ALL SELECT '$OK$OK$OK$OK$OK$OK$OK'
FROM DUAL
UNION ALL SELECT '12I45678912B456781234567812345671'
FROM DUAL)
SELECT *
FROM book
WHERE CASE
WHEN VALIDATE_CONVERSION(SUBSTR(text,18,16) AS NUMBER) = 1
THEN MOD(TO_NUMBER(SUBSTR(text,18,16)),5)
ELSE 0
END = 1 ;
Output
TEXT
12I45678912B456781234567812345671
Assuming the condition should be true if and only if the 16-character substring starting at position 18 is made up of 16 digits, and the number is equal to 1 modulo 5, then you could write it like this:
...
where .....
and case when translate(substr(text, 18, 16), 'z0123456789', 'z') is null
and substr(text, 33, 1) in ('1', '6')
then 1 end
= 1
This will check that the substring is made up of all-digits: the translate() function will replace every occurrence of z in the string with itself, and every occurrence of 0, 1, ..., 9 with nothing (it will simply remove them). The odd-looking z is needed due to Oracle's odd implementation of NULL and empty strings (you can use any other character instead of z, but you need some character so no argument to translate() is NULL). Then - the substring is made up of all-digits if and only if the result of this translation is null (an empty string). And you still check to see if the last character is 1 or 6.
Note that I didn't use any regular expressions; this is important if you have a large amount of data, since standard string functions like translate() are much faster than regular expression functions. Also, everything is based on character data type - no math functions like mod(). (Same as in Thorsten's answer, which was only missing the first part of what I suggested here - checking to see that the entire substring is made up of digits.)
SELECT text
FROM book
WHERE lyrics IS NULL
AND case when regexp_like(SUBSTR(text,18,16),'^[^a-zA-Z]*$') then MOD(TO_NUMBER(SUBSTR(text,18,16)),5)
else null
end = 1;
Original Query:
SELECT F4105.COUNCS/10000 FROM F4105
Output:
Numeric Expression
--------------------
111.1643000000000000
111.1633000000000000
111.1633000000000000
101.7654000000000000
101.7654000000000000
112.7258000000000000
I need to remove at least the last 5 zeroes. I tried to do a substring but it didn't work.
Here is the query(s) i tried:
(1)
SELECT SUBSTR((F4105.COUNCS/10000 AS 'co'),length((co)-5) FROM F4105
(2)
SELECT SUBSTR((F4105.COUNCS/10000),length((F4105.COUNCS/10000)-5)) FROM F4105
The 1st query gave me and error:
Token F4105 was not valid. Valid tokens: (.
The 2nd query worked by wrong output.
SUBSTR
00
000000
000000
000000
000000
000000
You are mixing the column alias definition in the expression. So, the correct expression is more like:
SELECT SUBSTR(F4105.COUNCS/10000, length(F4105.COUNCS/10000.0) - 5) as coFROM F4105
I wouldn't recommend doing this, however. You have a numeric expression. Just convert it to a decimal representation that you want, say:
SELECT CAST(F4105.COUNCS/10000.0 as DECIMAL(10, 5))
The syntax for SUBSTR scalar is effectively SUBSTR(expression, start-pos, for-length) IBM i 7.1->Database->Reference->SQL reference->Built-in functions->Scalar functions->SUBSTR
The LENGTH() expression shown used in the OP is specified for the second argument; i.e. the start-pos argument. As a starting position, the result of that string-length minus five calculation is conspicuously incorrect for obtaining the leftmost data; i.e. the starting-position is five bytes less than the length of the string. That would locate, of course, some insignificant zeroes five bytes from the end of the string-representation of the decimal-result of the division.
As effective correction therefore, would be either of • insert the constant integer value of 1 for the start-pos argument [thus making the LENGTH() expression become the third argument] • replace the SUBSTR scalar with the LEFT scalar.Either of those revisions would achieve something that at least resembles what is alluded as the desired output. However without either of the DDL and what should be the explicit output being expressed in the OP, the actual effect of those revised expressions could only be guessed.Anyhow, even with either of those changes, those suggested alternative character-string expressions remain as similarly poor [approaching daft] choice of expressions as the one in the OP, per lack of explicit casting; i.e. the two revised expressions suggested as possibly corrective [yet that remain similarly unlikely to yield desirable results] are:
SUBSTR((F4105.COUNCS/10000), 1,length((F4105.COUNCS/10000)-5))
LEFT((F4105.COUNCS/10000),length((F4105.COUNCS/10000)-5))
Having established data-type\length attributes using explicit casting [i.e. established even without some actual DDL to do so] in a derived-table expression that generates the input values that would produce the output shown in the OP, from a list of literal numeric values, the character-string expression in the following query ensures that only the eleven digits of decimal-precision to the right of the decimal point [i.e. the scale] are maintained; thus visually, the effect is that the trailing five digits are truncated:
with F4105 (COUNCS) as ( values
( dec( 1111643. , 9, 2 ) )
,( dec( 1111633. , 9, 2 ) )
,( dec( 1111633. , 9, 2 ) )
,( dec( 1017654. , 9, 2 ) )
,( dec( 1017654. , 9, 2 ) )
,( dec( 1127258. , 9, 2 ) )
)
SELECT cast( dec( (F4105.COUNCS/10000), 17, 11 ) as varchar(19) )
FROM F4105
I need to filter out junk data in SQL (SQL Server 2008) table. I need to identify these records, and pull them out.
Char[0] = A..Z, a..z
Char[1] = 0..9
Char[2] = 0..9
Char[3] = 0..9
Char[4] = 0..9
{No blanks allowed}
Basically, a clean record will look like this:
T1234, U2468, K123, P50054 (4 record examples)
Junk data looks like this:
T12.., .T12, MARK, TP1, SP2, BFGL, BFPL (7 record examples)
Can someone please assist with a SQL query to do a LEFT and RIGHT method and extract those characters, and do a LIKE IN or something?
A function would be great though!
The following should work in a few different systems:
SELECT *
FROM TheTable
WHERE Data LIKE '[A-Za-z][0-9][0-9][0-9][0-9]%'
AND Data NOT LIKE '% %'
This approach will indeed match P2343, P23423JUNK, and other similar text but requires that the format is A0000*.
Now, if the OP implies a format of 1st position is a character and all succeeding positions are numeric, as in A0+, then use the following (in SQL Server and a good deal of other database systems):
SELECT *
FROM TheTable
WHERE SUBSTRING(Data, 1, 1) LIKE '[A-Za-z]'
AND SUBSTRING(Data, 2, LEN(Data) - 1) NOT LIKE '%[^0-9]%'
AND LEN(Data) >= 5
To incorporate this into a SQL Server 2008 function, since this appears to be what you'd like most, you can write:
CREATE FUNCTION ufn_IsProperFormat(#data VARCHAR(50))
RETURNS BIT
AS
BEGIN
RETURN
CASE
WHEN SUBSTRING(#Data, 1, 1) LIKE '[A-Za-z]'
AND SUBSTRING(#Data, 2, LEN(#Data) - 1) NOT LIKE '%[^0-9]%'
AND LEN(#Data) >= 5 THEN 1
ELSE 0
END
END
...and call into it like so:
SELECT *
FROM TheTable
WHERE dbo.ufn_IsProperFormat(Data) = 1
...this query needs to change for Oracle queries because Oracle doesn't appear to support bracket notation in LIKE clauses:
SELECT *
FROM TheTable
WHERE REGEXP_LIKE(Data, '^[A-za-z]\d{4,}$')
This is the expansion gbn is doing in his answer, but these versions allow for varying string lengths without the OR conditions.
EDIT: Updated to support examples in SQL Server and Oracle for ensuring the format A0+, so that A1324, A2342388, and P2342 match but A2342JUNK and A234 do not.
The Oracle REGEXP_LIKE code was borrowed from Mark's post but updated to support 4 or more numeric digits.
Added a custom SQL Server 2008 approach which implements these techniques.
Depends on your database. Many have regex functions (note examples not tested so check)
e.g. Oracle
SELECT x
FROM table
WHERE REGEXP_LIKE(x, '^[A-za-z][:digit:]{4}$')
Sybase uses LIKE
Given that you're allowing between 3 and 6 digits for the number in your examples then it's probably better to use the ISNUMERIC() function on the 2nd character onwards:
SELECT *
FROM TheTable
-- start with a letter
WHERE Data LIKE '[A-Za-z]%'
-- everything from 2nd character onwards is a number
AND ISNUMERIC( SUBSTRING( Data, 2, 50 ) ) = 1
-- number doesn't have a decimal place
AND Data NOT LIKE '%.%'
For more information look at the ISNUMERIC function on MSDN.
Also note that:
I've limited the 2nd part with the number to 50 characters maximum, change this to suit your needs.
Strictly speaking you should check for currency symbols etc, as ISNUMERIC allows them, as well as +/- and some others
A better option might be to create a function that checks that each character after the first is between 0 and 9 (or 1 and 0 if you're using ASCII codes).
You can't use Regular Expressions in SQL Server, so you have to use OR. Correcting David Andres' answer...
WHERE
(
Data LIKE '[A-Za-z][0-9][0-9][0-9]'
OR
Data LIKE '[A-Za-z][0-9][0-9][0-9][0-9]'
OR
Data LIKE '[A-Za-z][0-9][0-9][0-9][0-9][0-9]'
)
David's answer allows "D1234junk" through
You also only need "[A-Z]" if you don't have case sensitivity