How to count length of line break as 2 characters in oracle sql - sql

I have data stored in table's column and it has a line break in the data. When I count the length of the string it returns me the count just fine. I want to make some changes and take the line break as 2 characters so if the data in table is something like this.
This
That
This should return length as 10 instead it is returning 9 for now which is understandable but I was to count the length of line break as 2 characters. So if there are 2 line breaks in data it will count them as 4 characters.
How can I achieve this ?
I want to use this in SUBSTR(COL, 1, 7)
By counting line break as 2 character it should return data like this
This
T
Hope someone can help

Just replace new line in the string with 2 characters, for example 'xx', before counting string length. More info on how to replace new lines in Oracle: Oracle REPLACE() function isn't handling carriage-returns & line-feeds

Update your value to have a line feed character before the carriage return character.
So if you have the table:
CREATE TABLE test_data ( value VARCHAR2(20) );
INSERT INTO test_data ( value ) VALUES ( 'This
That' );
Then you can insert the LF before the CR:
UPDATE test_data
SET value = REPLACE( value, CHR(10), CHR(13) || CHR(10) )
WHERE INSTR( value, CHR(10) ) > 0
Then your query:
SELECT SUBSTR( value, 1, 7 ) FROM test_data;
Outputs:
| SUBSTR(VALUE,1,7) |
| :---------------- |
| This |
| T |
db<>fiddle here

Related

substring after split by a separator oracle

Like if I have a string "123456,852369,7852159,1596357"
The out put looking for "1234,8523,7852,1596"
Requirement is....we want to collect 4 char after every ',' separator
like split, substring and again concat
select
REGEXP_REPLACE('MEDA,MEDA,MEDA,MEDA,MEDA,MEDA,MEDA,MEDA,MDCB,MDCB,MDCB,MDCB,MDCB,MDCB', '([^,]+)(,\1)+', '\1')
from dual;
we want to collect 4 char after every ',' separator
Here is an approach using regexp_replace:
select regexp_replace(
'123456,852369,7852159,1596357',
'([^,]{4})[^,]*(,|$)',
'\1\2'
)
from dual
Regexp breakdown:
([^,]{4}) 4 characters others than "," (capture that group as \1)
[^,]* 0 to n characters other than "," (no capture)
(,|$) either character "," or the end of string (capture this as \2)
The function replaces each match with capture 1 (the 4 characters we want) followed by capture 2 (the separator, if there is one).
Demo:
RESULT
1234,8523,7852,1596
One option might be to split the string, extract 4 characters and aggregate them back:
SQL> with test (col) as
2 (select '123456,852369,7852159,1596357' from dual)
3 select listagg(regexp_substr(col, '[^,]{4}', 1, level), ',')
4 within group (order by level) result
5 from test
6 connect by level <= regexp_count(col, ',') + 1;
RESULT
--------------------------------------------------------------------------------
1234,8523,7852,1596
SQL>
With REGEX_REPLACE:
select regexp_replace(the_string, '(^|,)([^,]{4})[^,]*', '\1\2')
from mytable;
This looks for
the beginning of the string or the comma
then four characters that are not a comma
then any number of trailing characters that are not a comma
And only keeps
the beginning or the comma
the four characters that follow
Demo: https://dbfiddle.uk/efUFvKyO

Same string but different byte length

I'm having some issues here trying to select some data.
I have a trigger that inserts in other table, but only if the string value doesn't exists on the other table , so I validate before insert by counting:
SELECT COUNT(*) INTO v_puesto_x_empresa FROM p_table_1
WHERE PUE_EMP_ID = Z.PIN_CODEMP
AND PUE_NOMBRE = v_puesto_nombre;
If the counter above is lower than 1 or equal to 0, then the process allow to insert data into the correspondent table.
As it turns out, is duplicating the data for some strange reason, so I checked the source.
I use a cursor that prepares the data I need to insert, and I noticed that for some strings, even though they are the same, it treats them as different strings.
select
UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci')))) PIN_DESCRIPCION,
LENGTHB(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) LEGTHB,
LENGTH(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) AS "NORMAL LENGTH",
LENGTHC(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) AS "LENGTH C",
LENGTH2(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) AS "LENGTH 2"
FROM PES_PUESTOS_INTERNOS
where pin_codemp = '8F90CF5D287E2419E0530200000AA716'
group by PIN_DESCRIPCION
order by PIN_DESCRIPCION asc
;
These are the results:
Results but in text :
PIN_DESCRIPCION
----------------------------------------------------------------------
LEGTHB NORMAL LENGTH LENGTH C LENGTH 2
---------- ------------- ---------- ----------
ADMINISTRADOR DE PROCESOS
27 27 27 27
ADMINISTRADOR DE PROCESOS Y CALIDAD
36 36 36 36
AFORADOR
9 9 9 9
AFORADOR
10 10 10 10
ASISTENTE ADMINISTRATIVO
25 25 25 25
ASISTENTE ADMINISTRATIVO
26 26 26 26
So my guess, is that for some reason, even though they are the same, somehow they are treated as different internally.
Note: this table loads user input data, so the results, although are meant to be the 'same word', may encounter some linguistic char difference, such as :
user input 1: aforador
user input 2: Aforador
For that reason, I applied the next piece of code, so I can process only the one word(string):
UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))
So, if for example, I query the same data without that, I would get the following result:
PIN_DESCRIPCION
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
administrador de procesos
administrador de procesos
Administrador de Procesos y Calidad
Aforador
aforador
aforador
I'll appreciate any help with this issue.
Thanks in advance.
My best regards.
This strings are not the same. The second one has another character that is not (obviously) being displayed.
If you do:
SELECT PIN_DESCRIPCION,
DUMP(PIN_DESCRIPCION)
FROM PES_PUESTOS_INTERNOS
WHERE pin_codemp = '8F90CF5D287E2419E0530200000AA716'
GROUP BY PIN_DESCRIPCION
ORDER BY PIN_DESCRIPCION asc;
Then you will see the binary values that comprise the data and should see that there is an extra trailing character. This could be:
Whitespace (which you can remove with RTRIM);
A zero-width character;
A NUL (ASCII 0) character;
Or something else that is not being displayed.
For example, if you have a function to create a string from bytes (passed as hexadecimal values):
CREATE FUNCTION createString( hex VARCHAR2 ) RETURN VARCHAR2 DETERMINISTIC
IS
value VARCHAR2(50);
BEGIN
DBMS_STATS.CONVERT_RAW_VALUE( HEXTORAW( hex ), value );
RETURN value;
END;
/
Then we can create the sample data:
CREATE TABLE table_name ( value VARCHAR2(50) )
/
INSERT INTO table_name ( value )
SELECT 'AFORADOR' FROM DUAL UNION ALL
SELECT 'AFORADOR' || createString( '00' ) FROM DUAL UNION ALL
SELECT 'AFORADOR' || createString( '01' ) FROM DUAL
/
Then, if we use:
SELECT value,
DUMP( value ) AS dump,
LENGTH(TRIM(value)) AS length
FROM table_name
This outputs:
| VALUE | DUMP | LENGTH |
|-----------|----------------------------------------|--------|
| AFORADOR | Typ=1 Len=8: 65,70,79,82,65,68,79,82 | 8 |
| AFORADOR | Typ=1 Len=9: 65,70,79,82,65,68,79,82,0 | 9 |
| AFORADOR | Typ=1 Len=9: 65,70,79,82,65,68,79,82,1 | 9 |
(Yes, the alignment of the table is messed up ... that's because of the unprintable characters in the string.)
sqlfiddle here
Update:
From comments:
What is the output of SELECT DUMP(pin_descripcion) FROM <rest of your query> for those rows?
These are the results:
Aforador Typ=1 Len=8: 65,102,111,114,97,100,111,114
aforador Typ=1 Len=9: 97,102,111,114,97,100,111,114,0
Look at the difference in the byte values.
The first characters are 65 (or A) and 97 (or a); however, by using UPPER, you are masking that difference in your output. If you use UPPER in the GROUP BY then the difference in case will not matter.
Additionally, the second one has an extra byte at the end with the value of 0 (ASCII NUL) so, regardless of case, they are different strings. This extra last character is not printable so you won't see the difference in the output (and is used as a string terminator so may have unexpected side-effects; for example, trying to display a NUL character in a db<>fiddle hangs the output) but it means that the strings are different so that GROUP BY will not aggregate them together.
You can do:
CREATE FUNCTION createStringFromHex( hex VARCHAR2 ) RETURN VARCHAR2 DETERMINISTIC
IS
value VARCHAR2(50);
BEGIN
DBMS_STATS.CONVERT_RAW_VALUE( HEXTORAW( hex ), value );
RETURN value;
END;
/
Then, to remove the NUL string terminators:
UPDATE PES_PUESTOS_INTERNOS
SET PIN_DESCRIPCION = RTRIM( PIN_DESCRIPCION, createStringFromHex( '00' ) )
WHERE SUBSTR( PIN_DESCRIPCION, -1 ) = createStringFromHex( '00' );
Then you should be able to do:
SELECT UPPER(PIN_DESCRIPCION) AS PIN_DESCRIPCION,
LENGTH(UPPER(PIN_DESCRIPCION)) AS LENGTH
FROM PES_PUESTOS_INTERNOS
--where pin_codemp = '8F90CF5D287E2419E0530200000AA716'
group by UPPER(PIN_DESCRIPCION)
order by UPPER(PIN_DESCRIPCION) asc;
and only see single rows.
db<>fiddle here

How to pull a value in between multiple values?

I have a column named Concatenated Segments which has 12 segment values, and I'm looking to edit the formula on the column to only show the 5th segment. The segments are separated by periods.
How would I need to edit the formula to do this?
Would using a substring work?
Alternatively, using good old SUBSTR + INSTR combination
possibly faster on large data sets
which doesn't care about uninterrupted strings (can contain anything between dots)
SQL> WITH
2 -- thank you for typing, #marcothesane
3 indata(s) AS (
4 SELECT '1201.0000.5611005.0099.211003.0000.2199.00099.00099.0000.0000.00000' FROM dual
5 )
6 select substr(s, instr(s, '.', 1, 4) + 1,
7 instr(s, '.', 1, 5) - instr(s, '.', 1, 4) - 1
8 ) result
9 from indata;
RESULT
------
211003
SQL>
Use REGEXP_SUBSTR(), searching for the 5th uninterrupted string of digits, or the 5th uninterrupted string of anything but a dot (\d and [^\.]) starting from position 1 of the input string:
WITH
-- your input ... paste it as text next time, so I don't have to manually re-type it ....
indata(s) AS (
SELECT '1201.0000.5611005.0099.211003.0000.2199.00099.00099.0000.0000.00000' FROM dual
)
SELECT
REGEXP_SUBSTR(s,'\d+',1,5) AS just_digits
, REGEXP_SUBSTR(s,'[^\.]+',1,5) AS between_dots
FROM indata;
-- out just_digits | between_dots
-- out -------------+--------------
-- out 211003 | 211003

show value from text..if delimited by ; show only first value

I have table with values. It is ntext because of ; delimited. Values can be empty, 1 number and numbers delimited by semicolon (as shown)
+-----------+
| room |
+-----------+
| 64 |
+-----------+
| 60008 |
+-----------+
| |
+-----------+
| 127;50047 |
+-----------+
I have this code. Substring is looking for ; and show first value. It is working only where the values are delimited. So how can I change it, that it will show first value when ; and single value also. So from table bellow I will get 64,60008, ,127.
SELECT
T0.U_Scid as 'id',
T3.U_Boarding as 'start',
T3.U_Boarding as 'end',
SUBSTRING(T5.U_Partner, 0, CHARINDEX(';', T5.U_Partner)) AS 'room_id',
CASE WHEN datalength(T5.U_Partner)=0 THEN '9999' ELSE T5.U_Partner END AS 'room_id' ,
CASE WHEN datalength(T5.U_Partner) > 4 THEN T5.U_Partner ELSE '9999' END AS 'partners_id' ,
This is just bonus question. CASE are looking for length of value, if the value is longer than 4 ( 600008 ) write to room_id 9999 and save 600008 to partners_id. If it is empty write 9999 to room_id.
How to make it works together?.so getting value from T_Partner..save it into temporary table T1.TempRoom ( I suppose ).. so T1.TempRoom (is filled with numbers like 64, ,60008,127) then CASE is checking T1.TempRoom for values and save it into room_id and partners_id.
Am I right?
Here is a simple method:
SUBSTRING(T5.U_Partner, 1, CHARINDEX(';', T5.U_Partner + ';')) AS room_id,
That is, concatenate the semicolon to the argument for CHARINDEX(). That will prevent any error occurring.
In addition, indexing for SUBSTRING() starts at 1, not 0.
And, don't use single quotes for column names. Only use them for string and date literals.
EDIT:
You can always use the verbose form:
(CASE WHEN T5.U_Partner LIKE '%;%'
THEN SUBSTRING(T5.U_Partner, 1, CHARINDEX(';', T5.U_Partner + ';'))
ELSE T5.U_Partner
END) AS room_id

Teradata : Sum up values in a column

Problem Statement
Example is shown in below image :
The last 2 rows have the patterns like "1.283 2 3" in a single cell. The numbers are seperated by space in the column. We need to add those nos and represent in the format given in Output.
So, the cell having "1.283 2 3" must be converted to 6.283
Challenges facing :
The column values are in string format.
Add nos after casting them into integer
Donot want to take data in UNIX box and manipulate the same.
In TD14 there would be a built-in table UDF named STRTOK_SPLIT_TO_TABLE, before you need to implement your own UDF or use a recursive query.
I modified an existing string splitting script to use blanks as delimiter:
CREATE VOLATILE TABLE Strings
(
groupcol INT NOT NULL,
string VARCHAR(991) NOT NULL
) ON COMMIT PRESERVE ROWS;
INSERT INTO Strings VALUES (1,'71.792');
INSERT INTO Strings VALUES (2,'71.792 1 2');
INSERT INTO Strings VALUES (3,'1.283 2 3');
WITH RECURSIVE cte
(groupcol,
--string,
len,
remaining,
word,
pos
) AS (
SELECT
GroupCol,
--String,
POSITION(' ' IN String || ' ') - 1 AS len,
TRIM(LEADING FROM SUBSTRING(String || ' ' FROM len + 2)) AS remaining,
TRIM(SUBSTRING(String FROM 1 FOR len)) AS word,
1
FROM strings
UNION ALL
SELECT
GroupCol,
--String,
POSITION(' ' IN remaining)- 1 AS len_new,
TRIM(LEADING FROM SUBSTRING(remaining FROM len_new + 2)),
TRIM(SUBSTRING(remaining FROM 1 FOR len_new)),
pos + 1
FROM cte
WHERE remaining <> ''
)
SELECT
groupcol,
-- remove the NULLIF to get 0 for blank strings
SUM(CAST(NULLIF(word, '') AS DECIMAL(18,3)))
FROM cte
GROUP BY 1
This might use a lot of spool, hopefully you're not running that on a large table.