Remove # characters from arrays in PostgreSQL table? - sql

I have a field (of type character varying) called 'directedlink_href' in a table which contains arrays that have values that all start with a '#' character.
How am I able to remove the '#' character from any entries in these arrays in this field?
For instance...
{#osgb4000000030451486,#osgb4000000030451491}
to
{osgb4000000030451486,osgb4000000030451491}

The clean solution is to unnest, replace and then re-aggregate the values:
select id,
(select array_agg(substr(x.val,2) order by x.idx) from unnest(t1.directedlink_href) with ordinality as x(val,idx)) as data
from the_table t1;
If you want to actually change the data in the table:
update the_table t1
set directedlink_href = (select array_agg(substr(x.val,2) order by x.idx) from unnest(t1.directedlink_href) with ordinality as x(val,idx));
This simply strips off the first character. If you might have other characters at the start of the value, you need to use regexp_replace(x.val,'^#', '') instead of the substr(x.val,2)

#a_horse_with_no_name got my upvote for a cleaner and more "Posgres-ish" solution.
I was about to delete this answer, but after some tests, it seems that performance wise this solution has an advantage.
Therefore, I would leave this solution here, but I do recommend to choose the solution of #a_horse_with_no_name as the right answer.
I'm using chr(1) has a character that most likely does not appear in the array's' elements.
select string_to_array(substr(replace(array_to_string(directedlink_href,chr(1)),chr(1)||'#',chr(1)),2),chr(1))
from t
;

Think this is a simpler and more generic solution, thought I'd share:
SELECT regexp_split_to_array(regexp_replace(array_to_string(ARRAY['#osgb4000000030451486','#osgb4000000030451491'], '__DELIMITER__'), '#', '', 'g'), '__DELIMITER__');

Related

Oracle SQL replace

Unfortunately I don't have the possibility to change field type.
I would like to REPLACE a , to . in a Typ=1 type of field (e.g.: 4,37 so in the end it should be 4.37), and I've tried CAST() and TO_NUMBER and TO_CHAR and I don't even know what else also, but I keep getting the ORA-01722 and it drives me crazy already. Why does it have to be a number for replacing ???
SELECT REPLACE(fmm, ',', '.') fmm FROM ...
Or do you have a better idea how can I do it without REPLACE maybe ?
UPDATE: it seems he has a problem with:
ORDER BY TO_NUMBER(fmm, '99D99')
So it seems he is taking the replaced version, so with . of fmm, but why ????
Try to remove the commas by replace(nvl(nr,0),',',''), and then formatting by
with tab as
(
select '1,234,567' as nr
from dual
)
select to_char(
replace(nvl(nr,0),',','')
,'fm999G999G990','NLS_NUMERIC_CHARACTERS = '',.''')
as "Number"
from tab;
Number
----------
1.234.567
Demo
Passing a string (varchar2) value into the replace function cannot throw an ORA-01722.
it seems he has a problem with:
ORDER BY TO_NUMBER(fmm, '99D99')
If that's complaining when fnm is '4,37' then you could add a replace() call inside the to_number(), but it's simpler/clearer to specify the NLS_NUMERIC_CHARACTERS as part of the conversion, so it knows that D is represented by a comma, and doesn't rely on the session settings:
order by to_number(fnm, '99D99', 'NLS_NUMERIC_CHARACTERS=,.')
If your table has a mix of values with period and comma decimal separators then you need to fix the data - this is the main reason you should not be storing numbers as strings in the first place. If you can't fix the data then you can workaround it with replace(), but it isn't ideal; you can then use a fixed period as the decimal character:
order by to_number(replace(fnm, ',', '.'), '99.99');
or still specify NLS_NUMERIC_CHARACTERS:
order by to_number(replace(fnm, ',', '.'), '99D99', 'NLS_NUMERIC_CHARACTERS=.,')
Either way that is 'normalising' all the string to only have periods, with no commas; and that allows them all to be converted.
db<>fiddle
what I don't understand, if I do some changes in the SELECT to a field, how can it affect the ORDER BY section? fmm should still remain 4,37 and not 4.37 in the ORDER BY section, shouldn't it?
No, because you gave the column expression REPLACE(fmm, ',', '.') the alias fnm, which is the same as the original column name; and the order-by clause is the only place column aliases are allowed, where it masks the original table column. When you do:
ORDER BY TO_NUMBER(fmm, '99D99')
the fnm in that conversion is the value of the column expression aliased as fnm, and not the original table column.
You can still access the table column, but to do so you have to prefix it with table name or alias, as the column from expression from the select list takes precedence (which is implied but not stated clearly in the docs:
expr orders rows based on their value for expr. The expression is based on columns in the select list or columns in the tables, views, or materialized views in the FROM clause.
So you can either explicitly refer to the table column via the table name or, here, an alias:
SELECT REPLACE(t.fmm, ',', '.') fmm
FROM your_table t
ORDER BY TO_NUMBER(t.fmm, '99D99')
though you still shouldn't rely on the session NLS settings really, so can/should still specify the NLS option to match the table column format:
SELECT REPLACE(t.fmm, ',', '.') fmm
FROM your_table t
ORDER BY TO_NUMBER(t.fmm, '99D99', 'NLS_NUMERIC_CHARACTERS=,.')
or use the replaced value and specify the NLS option for that (notice the option itself is different):
SELECT REPLACE(fmm, ',', '.') fmm
FROM your_table
ORDER BY TO_NUMBER(fmm, '99D99', 'NLS_NUMERIC_CHARACTERS=.,')
db<>fiddle
If your table has a mix of period and comma values then you need to use the column-alias version so it is consistent when it tries to convert. If you you only have commas then you can use either. (But again, you shouldn't be storing numbers as strings in the first place...)

Select rows that has mixed charcters in a single value e.g. 'Joh?n' in name column

In an oracle table:
1- a value in a VARCHAR column contains characters that are not letters.
Consider a scenarion where a name in 'last_name' column contains regular characters (A - Z, a - z) as well as characters that are not english letters (e.g. '.', '-', ' ','_', '>' or similar).
The challenge is to select the rows that has names in 'last_name' as '.John' or 'John.' or '-John' or 'Joh-n'
2- Is it possible to have non-date values in a Date defined column? If yes, how can such records be selected in an oracle query?
Thanks!
I believe this will do the trick:
SELECT * FROM mytable WHERE REGEXP_LIKE(last_name, '[^A-Za-z]');
As for your 2nd question, I am unsure. I would be glad if someone else could add on to what I have to answer your 2nd question. I have found this website thought that might be of help. http://infolab.stanford.edu/~ullman/fcdb/oracle/or-time.html
It explains the DATE format.
If I properly understand your goal, you need to select rows with last_name column containing the name 'John', but it may also have additional characters before, after, or even inside the name. In that case, this should be helpful:
select * from tab where regexp_replace(last_name, '[^A-Za-z]+', '') = 'John'

SQL select from list where white space has been added to end

I'm trying to select some rows from an Oracle database like so:
select * from water_level where bore_id in ('85570', '112205','6011','SP068253');
This used to work fine but a recent update has meant that bore_id in water_level has had a bunch of whitespace added to the end for each row. So instead of '6011' it is now '6011 '. The number of space characters added to the end varies from 5 to 11.
Is there a way to edit my query to capture the bore_id in my list, taking account that trialling whitespace should be ignored?
I tried:
select * from water_level where bore_id in ('85570%', '112205%','6011%','SP068253%');
which returns more rows than I want, and
select * from water_level where bore_id in ('85570\s*', '112205\s*','6011\s*', 'SP068253\s*');
which didn't return anything?
Thanks
JP
You should RTRIM the WHERE clause
select * from water_level where RTRIM(bore_id) in ('85570', '112205','6011');
To add to that, RTRIM has an overload which you can pass a second parameter of what to trim, so if the trailing characters weren't spaces, you could remove them. For example if the data looked like 85570xxx, you could use:
select * from water_level where RTRIM(bore_id, 'x') IN ('85570','112205', '6011');
You could use the replace function to remove the spaces
select * from water_level where replace(bore_id, ' ', '') in ('85570', '112205', '6011', 'SP068253');
Although, a better option would be to remove the spaces from the data if they are not supposed to be there or create a view.
I'm guessing bore_id is VARCHAR or VARCHAR2. If it were CHAR, Oracle would use (SQL-standard) blank-padded comparison semantics, which regards 'foo' and 'foo ' as equivalent.
So, another approach is to force comparison as CHARs:
SELECT *
FROM water_level
WHERE CAST(bore_id AS CHAR(16)) IN ('85570', '112205', '6011', 'SP068253');

replace two characters in one cell

I am using this query to replace one character in a cell
select replace(id,',','')id from table
But I want to replace two characters in a cell.
If the cell is having this data (1,3.1), and I want it to look like this (131).
How can I replace two different characters in one cell?
Use TRANSLATE instead of REPLACE(). It replaces each occurrence of a character in the first pattern with its matched character in the second. To remove characters, simply leave cut short the replacement string:
select translate(id, '1,.', '1') id from table
Note that the second string cannot be null. Hence the need to include 1 (or some other character) in both strings.
Find out more.
Obviously the more characters you need to convert/remove the more attractive TRANSLATE() becomes. The main use for REPLACE is changing patterns (such as words) rather than individual characters.
Can use
select replace(translate(id,',.',' '),' ','') from table;
or
select regexp_replace('1,3.1','[,.]','') from dual;
or
select replace(replace(id,',',''),'.','') from table;
Call the replace again.
select replace(replace(id,',',''), '.','') id from table
Do this:
select REPLACE(REPLACE(id,',',''),'.','')
Or use a regular expression:
select regexp_replace(id, '[.,]', '') id from table
Find out more

Oracle: a query, which counts occurrences of all non alphanumeric characters in a string

What would be the best way to count occurrences of all non alphanumeric characters that appear in a string in an Oracle database column.
When attempting to find a solution I realised I had a query that was unrelated to the problem, but I noticed I could modify it in the hope to solve this problem. I came up with this:
SELECT COUNT (*), SUBSTR(TITLE, REGEXP_INSTR(UPPER(TITLE), '[^A-Z,^0-9]'), 1)
FROM TABLE_NAME
WHERE REGEXP_LIKE(UPPER(TITLE), '[^A-Z,^0-9]')
GROUP BY SUBSTR(TITLE, REGEXP_INSTR(UPPER(TITLE), '[^A-Z,^0-9]'), 1)
ORDER BY COUNT(*) DESC;
This works to find the FIRST non alphanumeric character, but I would like to count the occurrences throughout the entire string, not just the first occurrence. E. g. currently my query analysing "a (string)" would find one open parenthesis, but I need it to find one open parenthesis and one closed parenthesis.
There is an obscure Oracle TRANSLATE function that will let you do that instead of regexp:
select a.*,
length(translate(lower(title),'.0123456789abcdefghijklmnopqrstuvwxyz','.'))
from table_name a
Try this:
SELECT a.*, LENGTH(REGEXP_REPLACE(TITLE, '[^a-zA-Z0-9]'), '')
FROM TABLE_NAME a
The best option, as you discovered is to use a PL/SQL procedure. I don't think there's any way to create a regex expression that will return multiple counts like you're expecting (at least, not in Oracle).
One way to get around this is to use a recursive query to examine each character individually, which could be used to return a row for each character found. The following example will work for a single row:
with d as (
select '(1(2)3)' as str_value
from dual)
select char_value, count(*)
from (select substr(str_value,level,1) as char_value
from d
connect by level <= length(str_value))
where regexp_instr(upper(char_value), '[^A-Z,^0-9]'), 1) <> 0
group by char_value;