Delete empty lines in a text blob

Delete empty lines in a text blob - sql

I have a list of email addresses in a table called cc_list (blob(text)). To remove an email address from the list, I have used the replace function
update actions
set cc_list=replace(cc_list,'email#bob.com','')
where contact_id=85
Now, the list shows as
email
email
email
email
In the GUI, I just see the empty lines at the top. I have tried the trim option removing the carriage return (ascii_char(9))
trim(leading from replace (cc_list, ascii_char(9),''))
and
replace(cc_list,ascii_char(9),'')
I still see the empty lines.
What can I do to fix this?

The obvious solution would be to normalize your database and not store a list of things in an unstructured datatype like a blob, but instead use a many-to-one solution to store email addresses.
The reason your replace leaves a line break, is because you only replace the email address, so replacing <address2> in <address1><LF><address2><LF><address3> by an empty string leaves you with <address1><LF><LF><address3>.
The reason trim(leading ...) doesn't work, is because that only works for white space at the start of the blob, your line break is in the middle of the blob, in addition, by default trim only trims spaces (character 32).
The reason replace(..., ascii_char(9), '') doesn't work is because character 9 is a TAB, not a linefeed (LF, character 10), nor a carriage return (CR, character 13). In addition, attempting this replacement for only single line break would remove all line breaks from the blob, making your email addresses invalid as they would all end up on a single line.
Assuming your blob only contains linefeed (LF) (and not carriage return (CR) or CRLF), fixing the already broken blobs can be done by replacing all occurrences of two consecutive linefeed character with a single linefeed:
replace(cc_list, x'0a0a', x'0a')
(use x'0d0d', x'0d' for CR, or x'0d0a0d0a', x'0d0a' for CRLF)
or (if you're using a Firebird version that does not support hexadecimal literals):
replace(cc_list, ascii_char(10) || ascii_char(10), ascii_char(10))
Moving forward, you should attempt to replace an email address followed by a line break by an empty string. Note that this assumes that the last email address in a list is also followed by a line break:
replace(cc_list, 'email#bob.com' || ascii_char(10), '')

Related

Query to replace special characters in phone number field

Can anyone help with a query on how to replace special/non-numeric/hidden characters from a phone number column.
I've tried
LTRIM(RTRIM(REGEXP_REPLACE(
PHONE_NBR,
'[^[:digit:]][:cntrl:][:alpha:][:graph:][:blank:][:print:][:punct:][:space:]~',
'')))
but no luck, there are still a few records which contain non-numeric values.

Your regex is saying to ONLY replace a string consisting of: a non-numeric character followed by a control character, an alpha, a graph, a blank, a print, a punct, a space, and then a tilde.
You should be able to just use '[^[:digit:]]' as your regex, to remove all non-numeric characters.

I was wondering if there is any way to treat delimiters inside quotes as merely characters and not delimiters

I have a massive amount of files that are all made using the same schema. They are put into a format where they are space delimited. A sample file row looks like this:
1 2 abc def "g h" 3
And when I try to use the schema INT, INT, STRING, STRING, STRING, INT, it fails for me because of the space inside the quotation marks.
I know this is where the error is because if I make a sample tab separated instead of space separated, no such error occurs, but that is not feasible for me to do with all of my data. I was wondering if there is any way to be able to indicate in a file upload that delimiters in quotes should not be treated as delimiters but rather as characters? (Rather that all quoted text should be treated as one string.)
I know this feature exists for new line characters, and so I was wondering about delimiters.
Thank you!

I figured it out. The error was there was an extra delimiter character at the end of the file. Now I just need to trim each line of the file before uploading.

How can I replace Pipe (|) with space using regexp_replace in Teradata?

I would like to replace all pipes and line breaks with space in a free text field in my data base.
My current approach looks like the following:
SELECT
ID,
REGEXP_REPLACE(REGEXP_REPLACE(FREETEXT,'|',‘ ‘),‘\n',' ')
FROM TABLE
My idea is to replace the pipes | with a space and then the results get checked again and all linebreaks are replaced. Problem now is that there are still pipes in there which messes up the CSV since my delimter for that is |.
Hope anyone can help me out here.
PS: I am not able to change the delimter to something else.

The pipe symbol is a special character in a Regular Expression, splitting it into multiple alternatives, thus you must escape it.
If you want to replace all pipe and line break characters you don't have to nest:
RegExp_Replace(FREETEXT,'[\|\n\r]',' ')
\| pipe 0x7C
\n line feed 0x0A
\r carriage return 0x0D
But as those are single characters you can simply use
OTranslate(FREETEXT, '7C0A0D'xc,' ')
Only if you want to replace consecutive occurences of those characters with a single space you need a RegEx:
RegExp_Replace(FREETEXT,'[\|\n\r]+',' ')

DB2/iSeries SQL clean up CR/LF, tabs etc

I need to find and clean up line breaks, carriage returns, tabs and "SUB"-characters in a set of 400k+ string records, but this DB2 environment is taking a toll on me.
Thought I could do some search and replacing with the REPLACE() and CHR() functions, but it seems CHR() is not available on this system (Error: CHR in *LIBL type *N not found). Working with \t, \r, \n etc doesn't seem to be working either. The chars can be in the middle of strings or at the end of them.
DBMS = DB2
System = iSeries
Language = SQL
Encoding = Not sure, possibly EBCDIC
Any hints on what I can do with this?

I used this SQL to find x'25' and x'0D':
SELECT
<field>
, LOCATE(x'0D', <field>) AS "0D"
, LOCATE(x'25', <field>) AS "25"
, length(trim(<field>)) AS "Length"
FROM <file>
WHERE LOCATE(x'25', <field>) > 0
OR LOCATE(x'0D', <field>) > 0
And I used this SQL to replace them:
UPDATE <file>
SET <field> = REPLACE(REPLACE(<field>, x'0D', ' '), x'25', ' ')
WHERE LOCATE(x'25', <field>) > 0
OR LOCATE(x'0D', <field>) > 0

If you want to clear up specific characters like carriage return (EBCDIC x'0d') and line feed (EBCDIC x'25') you should find the translated character in EBCDIC then use the TRANSLATE() function to replace them with space.
If you just want to remove undisplayable characters then look for anything under x'40'.

Here is an sample script that replaces X'41' by X'40'. Something that was creating issues at our shop:
UPDATE [yourfile] SET [yourfield] = TRANSLATE([yourfield], X'40',
X'41') WHERE [yourfield] like '%' concat X'41' concat '%'
If you need to replace more than one character, extend the "to" and "from" hexadecimal strings to the values you need in the TRANSLATE function.

Try TRANSLATE or REPLACE.
The brute force method involves using POSITION to find the errant character, then SUBSTR before and after it. CONCAT the two substrings (less the undesirable character) to re-form the column.
The character encoding is almost certainly one of the EBCDIC character sets. Depending on how the table got loaded in the first place, the CR may be x'0d' and the LF x'15', x'25'. An easy way to find out is to get to a green screen and do a DSPPFM against the table. Press F10 then F11 to view the table is raw, hexadecimal (over/under) format.

For details on the available functions see the
DB2 for i5/OS SQL Reference.

Perhaps the TRANSLATE() function will serve your needs.
TRANSLATE( data, tochars, fromchars )
...where fromchars is the set of characters you don't want, and tochars is the corresponding characters you want them replaced with. You may have to write this out in hex format, as x'nnnnnn...' and you will need to know what character set you are working with.
Using the DSPFFD command on your table should show the CCSID of your fields.

we struggled a lot to replace the new line char and carriage return from flat file.
Finally we used below sql to sort the issue.
REPLACE(REPLACE(COLUMN_NAME, CHR(13), ''), CHR(10), '')
Try it out
CR = CHR(13)
LF = CHR(10)

CSV Carriage Return Character

I have a CSV output on one of my applications. This produces a file from of web form data.
In some cases I am getting a carriage return character in my notes field. This causes an error when importing the file. I would like to remove this character.
The issue appears to be happening when users paste information into the form from word documents or holding down the shift key and pressing enter.
The field is ntext and populated in a multi line text box control.
I have been trying to remove this with a replace function but some carriage return characters seem to be getting through.
SQL
REPLACE(Fieldname), CHAR(13) + CHAR(10), ' ') AS new_Fieldname

It may be best to replace the characters separately, as they do not always occur together or in that order:
REPLACE(REPLACE(Fieldname, CHAR(13),' '), CHAR(10), ' ') AS new_Fieldname

Note that you may have a carriage return + line feed, or just a carriage return (depending on the source platform, the source of the data etc.). So you will probably need to handle both cases.

You can read CSVs with carriage return in them. The carriage return should be in a string represented field (i.e. surrounded by quotes). This allows you to read lines and incldue them in your field. If you are reading your CSV one line at a time, you need to maintain state between lines and append the data as necessary.
In .Net, the easiest way to read a CSV is using the Microsoft.VisualBasic.FileIO.textFileParser object (yes, you can use this in C# if you add a reference). This reads even the nastiest CSVs I've thrown at it with ease.

In Word, there are different kinds of new-line characters. Maybe you should also search/replace the other ones.
I'm not sure which are all the different possibilities, at least the paragraph mark is one that I know of.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Delete empty lines in a text blob - sql

Related

Query to replace special characters in phone number field

I was wondering if there is any way to treat delimiters inside quotes as merely characters and not delimiters

How can I replace Pipe (|) with space using regexp_replace in Teradata?

DB2/iSeries SQL clean up CR/LF, tabs etc

CSV Carriage Return Character

Categories

Resources