I want to trim a string to a specified length. If the string is shorter, I don't want to do anything. I found a function substr() which does the job. However there is nothing in the Oracle documentation what happens if the string is shorter, than maximal length.
For example this:
select substr('abc',1,5) from dual;
returns 'abc', which is what I need.
I'd like to ask if this is safe, because the function seems not to be defined for this usage. Is there a better way how to truncate?
It is totally ok, but if you want, you can use this query:
select substr('abc',1,least(5,length('abc'))) from dual;
This is an interesting question. Surprisingly, the documentation doesn't seem to cover this point explicitly.
I think what you are doing is quite safe. substr() is not going to "add" characters to the end of the string when the string is too short. I have depended on this behavior in many databases, including Oracle, over time. This is how similar functions work in other databases and most languages.
The one sort-of-exception would be when the original data type is a char() rather than varchar2() type. In this case, the function would return a string of the same type, so it might be padded with spaces. That, though, is a property of the type not really of the function.
If you want to be absolutely certain that you won't end up with trailing blanks by using SUBSTR alone (you won't, but sometimes it's comforting be really sure) you can use:
SELECT RTRIM(SUBSTR('abc',1,5)) FROM DUAL;
Share and enjoy.
It is better to use the below query
SELECT SUBSTR('abc',1,LEAST(5,LENGTH('abc'))) FROM DUAL;
Above query would either take the length of the string or the number 5 whichever is lower.
Related
I have a variable in a stored procedure that contains a string of characters like
[Tag]MESSAGE[/Tag]
I need a way to get the MESSAGE part from within the tags.
Any help would be much appreciated
Note: I have tested it on Oracle RDBMS
A more reliable approach is to use REGEXP_REPLACE.
REGEXP_REPLACE(value, pattern)
Example
SELECT REGEXP_REPLACE(
'<Tag>Message</Tag>',
'\s*</?\w+((\s+\w+(\s*=\s*(".*?"|''.*?''|[^''">\s]+))?)+\s*|\s*)/?>\s*') FROM DUAL;
Just replace "<" with "[" if your tags are different
What you need is this:
SELECT SUBSTRING(ColumnName,CHARINDEX('html_tag',ColumnName)+LEN('html_tag'),CHARINDEX('html_close_tag',ColumnName)-LEN('html_close_tag')) FROM TableName
You'll require to change the html_tag and html_close_tag with your own HTML tag that you want to get rid of.
If the column contains only single tag, simple call of substring function should be enough. Otherwise there will always be some point where regular expression does not suffice since you fall into trap (see this legendary StackOverflow answer).
I have a function in oracle that i need converted to postgres.
i can't seem to find a reason for this difference in docs docs but:
oracle:
SELECT substr('1236',-4, 4) FROM DUAL;
Result: 1236
postgres:
SELECT substr('1236',-4, 4);
Result: empty (Null)
i need an output similiar to oracle and i cant seem to understand why the postgres function differs, and what i can use as an alternative
I'm a little confused by your confusion. Oracle is quite clear that a negative position counts back from the end of the string.
Nothing in the Postgres documentation suggests that this behavior. There is no mention of negative positions (as far as I can tell) for any string functions other than left() and right(). And no hint whatsoever that negative positions have a special meaning in other contexts.
Postgres fortunately has a simpler way to do what you want:
select right('1236', 4)
I need a query to remove all alphanumeric characters from a string and give me only special characters.
If string is '##45gr##3' query should give me '####'.
SELECT REGEXP_REPLACE('##45gr##3','[^[:punct:]'' '']', NULL) FROM dual;
The old-fashioned way, with a replace() call:
select translate(upper(txt)
, '.1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ'
, '.') as spec_txt
from t42
/
With a replace() solution is better to type all the characters we want to exclude. Yes that is the larger set (probably) but at least it's well-defined. We don't want to keep revising the code to handle new characters in the input string.
Obviously regex solution is more compact and who can deny the elegance of #venkatesh solution? However, evaluating regular expressions is more expensive than more focused SQL functions, so it's worth knowing alternative ways of doing things, for those cases when performance is an issue.
Everything written in comments is most probably very true (especially the 5th one that talks about exceptions, special cases etc.). My feeling is that Jörg knows more about regular expressions than I'll ever know.
Anyway, to cut a long story short, in its very simple appearance, regarding the question posted ("remove all numbers and letters"), something like this might work:
SQL> select regexp_replace('a##45gr##3$.', '[[:digit:]]+|[[:alpha:]]+', '') result
2 from dual;
RESULT
------
####$.
SQL>
In Amazon Redshift tables, I have a string column from which I need to extract numbers only out. For this currently I use
translate(stringfield, '0123456789'||stringfield, '0123456789')
I was trying out REPLACE function, but its not gonna be elegant.
Any thoughts with converting the string into ASCII first and then doing some operation to extract only number? Or any other alternatives.
It is hard here as Redshift do not support functions and is missing lot of traditional functions.
Edit:
Trying out the below, but it only returns 051-a92 where as I need 05192 as output. I am thinking of substring etc, but I only have regexp_substr available right now. How do I get rid of any characters in between
select REGEXP_SUBSTR('somestring-051-a92', '[0-9]+..[0-9]+', 1)
might be late but I was solving the same problem and finally came up with this
select REGEXP_replace('somestring-051-a92', '[a-z/-]', '')
alternatively, you can create a Python UDF now
Typically your inputs will conform to some sort of pattern that can be used to do the parsing using SUBSTRING() with CHARINDEX() { aka STRPOS(), POSITION() }.
E.g. find the first hyphen and the second hyphen and take the data between them.
If not (and assuming your character range is limited to ASCII) then your best bet would be to nest 26+ REPLACE() functions to remove all of the standard alpha characters (and any punctuation as well).
If you have multibyte characters in your data though then this is a non-starter.
Better method is to remove all the non-numeric values:
select REGEXP_replace('somestring-051-a92', '[^0-9]', '')
You can specify "any non digit" that includes non-printable, symbols, alpha, etc.
e.g., regexp_replace('brws--A*1','[\D]')
returns
"1"
I have a varchar column in a table that is used to store xml data. Yeah I know there is an xml data type that I should be using, but I think this was set up before the xml data type was available so a varchar is what I have to use for now. :)
The data stored looks similar to the following:
<xml filename="100100_456_484351864768.zip"
event_dt="10/5/2009 11:42:52 AM">
<info user="TestUser" />
</xml>
I need to parse the filename to get the digits between the two underscores which in this case would be "456". The first part of the file name "shouldn't" change in length, but the middle number will. I need a solution that would work if the first part does change in length (you know it will change because "shouldn't change" always seems to mean it will change).
For what I have for now, I'm using XQuery to pull out the filename because I figured this is probably the better than straight string manipulation. I cast the string to xml to do this, but I'm not an XQuery expert so of course I'm running into issues. I found a function for XQuery (substring-before), but was unable to get it to work (I'm not even sure that function will work with SQL Server). There might be an XQuery function to do this easily, but if there is I am unaware of it.
So, I get the filename from the table with a query similar to the following:
select CAST(parms as xml).query('data(/xml/#filename)') as p
from Table1
From this I'd assume that I'd be able to CAST this back to a string then do some instring or charindex function to figure out where the underscores are so that I can encapsulate all of that in a substring function to pick out the part I need. Without going too far into this I am pretty sure that I can eventually get it done this way, but I know that there has to be an easier way. This way would make a huge unreadable field in the SQL Statement which even if I moved it to a function would still be confusing to try to figure out what is going on.
I'm sure there is an easier than this since it seems to be simple string manipulation. Perhaps someone can point me in the right direction. Thanks
You can use XQuery for this - just change your statement to:
SELECT
CAST(parms as xml).value('(/xml/#filename)[1]', 'varchar(260)') as p
FROM
dbo.Table1
That gives you a VARCHAR(260) long enough to hold any valid file name and path - now you have a string and can work on it with SUBSTRING etc.
Marc
The straightforward way to do this is with SUBSTRING and CHARINDEX. Assuming (wise or not) that the first part of the filename doesn't change length, but that you still want to use XQuery to locate the filename, here's a short repro that does what you want:
declare #t table (
parms varchar(max)
);
insert into #t values ('<xml filename="100100_456_484351864768.zip" event_dt="10/5/2009 11:42:52 AM"><info user="TestUser" /></xml>');
with T(fName) as (
select cast(cast(parms as xml).query('data(/xml/#filename)') as varchar(100)) as p
from #t
)
select
substring(fName,8,charindex('_',fName,8)-8) as myNum
from T;
There are sneaky solutions that use other string functions like REPLACE and PARSENAME or REVERSE, but none is likely to be more efficient or readable. One possibility to consider is writing a CLR routine that brings regular expression handling into SQL.
By the way, if your xml is always this simple, there's no particular reason I can see to use XQuery at all. Here are two queries that will extract the number you want. The second is safer if you don't have control over extra white space in your xml string or over the possibility that the first part of the file name will change length:
select
substring(parms,23,charindex('_',parms,23)-23) as myNum
from #t;
select
substring(parms,charindex('_',parms)+1,charindex('_',parms,charindex('_',parms)+1)-charindex('_',parms)-1) as myNum
from #t;
Unfortunately, SQL Server is not a conformant XQuery implementation - rather, it's a fairly limited subset of a draft version of XQuery spec. Not only it doesn't have fn:substring-before, it also doesn't have fn:index-of to do it yourself using fn:substring, nor fn:string-to-codepoints. So, as far as I can tell, you're stuck with SQL here.