How to delete  in notepad++ - sql

I have a ksh file which have set of SQL commands (for ex. 35 commands) in it and I used to open it in notepad++. It is executing and returning expected results for the first 5 SQL queries, but for the rest, it is throwing an error.
When I dug into it, in between the SQL commands, there is an extra character like Â. How to remove this? I have tried a lot to get rid of it with no luck.
Thanks.

There's a Hex Editor plugin for NotePad++. Look at your file as a sequence of byte values rather than an encoded text file. Find the byte value that corresponds with the  glyph.

Related

Combine SQL files with command `copy` in a batch file introduce an incorrect syntaxe because it does add an invisible character `U+FEFF`

In a pre-build event, a batch file is executed to combine multiple SQL files into a single one.
It is done using this command :
COPY %#ProjectDir%\Migrations\*.sql %#ProjectDir%ContinuousDeployment\AllFilesMergedTogether.sql
Everything appear to work fine but somehow the result give an incorrect syntaxe error.
After two hours of investigation, it turn out the issue is caused by an invisible character that remain invisible even with notepad++.
Using an online website, the character has been spotted and is U+FEFF has shown in following image.
Here are the two input scripts.
PRINT 'Script1'
PRINT 'Script2'
Here is the output given by the copy command.
PRINT 'Script1'
PRINT 'Script2'
Additional info :
Batch file is encoded with UTF-8
Input files are encoded with UTF-8-BOM
Output file is encoded with UTF-8-BOM.
I'm not sure it is possible to change the encoding output of command copy.
I've tried and failed.
What should be done to eradicate this extremely frustrating parasitic character?
It has turned out that changing encoding of input files to ANSI does fix the issue.
No more pesky character(s).
Also, doing so does change the encoding of the result file to UTF-8 instead of UTF-8-BOM which is great I believe.
Encoding can be changed using Notepad++ as show in following picture.

Informix 11.5 SQL Select Carriage Return and Line Feed

Informix 11.5
I am trying to search for carriage returns and line feeds that may exist in a VARCHAR field. First, I need a SELECT statement to show that they exist. Second, I need to REPLACE them with a space or other character.
I've tried all kinds of variations:
CHR(10) + CHR(13)
CHR(10) || CHR(13)
CHAR(13) + CHAR(10)
CHAR(13) || CHAR(10)
SELECT CHR(10) from systables;
Everything gives an error: Routine (chr) can not be resolved.
I've been searching all over and just can't find anything that works, and I'm sure this is crazy stupid easy.
Get the ASCII package from the IIUG
The CHR() function was added to IDS 11.70; it isn't in IDS 11.50.
The good news is you can add the function because IDS is an extensible server. The better news for you is that you can obtain the relevant code from the IIUG web site in the Software Archive under the Miscellaneous section as ascii.
That should allow you to do what you need. (Note: I wrote the code way back when — before there was support built into any of the servers.)
Windows makes things more complicated
I was uploading the ascii.unl file and I get an error that the number of columns do not match on line 13. Have you seen this before? I'm on Windows 2008. The errors are:
846: Number of values in load file is not equal to number of columns.
847: Error in load file line 13.
I hadn't seen it before, but I've not tried the file on Windows and … well, let's say life gets trickier on Windows than it is on Unix (and this bit isn't all that simple on Unix).
First of all, the data file needs to have CRLF line endings instead of the NL-only line endings that are standard on Unix. (Note that NL, newline, is another name for LF, line feed — aka '\n'.) For most lines in the unload file, that isn't a problem.
The two entries for which it might be (is) a problem are for CR and LF — entries 13 and 10 respectively. In theory, if the entry for line 10 contains (in C string notation) "10|\\\n\r\n" (that is, 10, pipe, backslash, newline, CRLF), all should be OK; the absence of an error message for line 10 suggests that it is OK.
Similarly, the entry for line 13 is "13|\r\r\n", which apparently causes grief. The simplest trial fix is to add a backslash here too: "13|\\\r\r\nn". The backslash says "the next character doesn't have a special meaning". If that doesn't work, we'll probably have to try hex-escape notation: "13|\\0d\r\n" — and use dbaccess -X to enable the hex escape notation.
With luck, one of those two (or both) will work. If neither works, come back and we'll try to think of something else.
As per my above comment:
I was uploading the ascii.unl file and I get an error that the number of columns do not match on line 13. Have you seen this before? I'm on Windows 2008. 846: Number of values in load file is not equal to number of columns. 847: Error in load file line 13.
Here is what I see in the ascii.unl file.
If I put this into MS Word and turn on Show Formatting/Paragraph marks, it shows this:

how do I write my output to a file using file sink to analyze the output data?

I'm reading binary data from a(two) file(s)(.txt), after performing a logical operation(XOR),writing output to another file(.txt)(using file sink). After I execute the flow graph, and open the file, it shows something like corrupted word document.please help me deal with it.
XORing two printable character bytes may lead to unprintable characters. So your text editor may not be able to open it properly. Try to open the file with a hex editor like hexdump or okteta.

Illegal xml parsing import to sql mac roman

I have a xml that says it's encoding is UTF-8. When I use openxml to import data into sql, I always get "XML parsing: line xxxxxx, character xx, illegal xml character.
Right now I can go to each line and replace it with the a legal character and it goes well. Sometimes there maybe be more than 5 mac roman characters and it becomes tedious to replace. I am currently using notepad ++ and there is probably a way for this.
Can anyone suggest if anything can be done in sql level or does it have to checked before ran in sql?
So far, most of the characters found are, x95, x92, x96, xbc, xbd, xbo.
Thanks.
In your question, you did not specify whether illegal characters you had to remove were Unicode or not. Or whether the file was really expected to contain UTF-8 characters. Unlike for the ASCII, for UTF-8 some byte combinations are illegal, so if you declare the text file to be encoded in UTF-8, you might not be able to read it successfully till end (such a thing could never happen with ASCII).
So it is possible that by removal of <?xml version="1.0" encoding="UTF-8"?> you just declared some non-unicode encoding of your file (instead of previously declared UTF-8), so reading the data passed. You did not have many foreign characters like ľťčý in the file, did you? Normally, it is a must that you check what happened to those after the import. It might happen that your import passes without error, but city name Čadca becomes äadca and somebody will thank your company for rendering his address unreadable.

Replace character in SQL results

This is from a Oracle SQL query. It has these weird skinny rectangle shapes in the database in places where apostrophes should be. (I wish we would could paste screen shots in here)
It looks like this when I copy and paste the results.
spouse�s
is there a way to write a SQL SELECT statement that searches for this character in the field and replaces it with an apostrophe in the results?
Edit: I need to change only the results in a SELECT statement for reporting purposes, I can't change the Database.
I ran this
select dump('�') from dual;
which returned
Typ=96 Len=3: 239,191,189
This seems to work so far
select translate('What is your spouse�s first name?', '�', '''') from dual;
but this doesn't work
select translate(Fieldname, '�', '''') from TableName
Select FN from TN
What is your spouse�s first name?
SELECT DUMP(FN, 1016) from TN
Typ=1 Len=33 CharacterSet=US7ASCII: 57,68,61,74,20,69,73,20,79,6f,75,72,20,73,70,6f,75,73,65,92,73,20,66,69,72,73,74,20,6e,61,6d,65,3f
EDIT:
So I have established that is the backquote character. I can't get the DB updated so I'm trying this code
SELECT REGEX_REPLACE(FN,"\0092","\0027") FROM TN
and I"m getting ORA-00904:"Regex_Replace":invalid identifier
This seems a problem with your charset configuracion. Check your NLS_LANG and others NLS_xxx enviroment/regedit values. You have to check the oracle server, your client and the client of the inserter of that data.
Try to DUMP the value. you can do it with a select as simple as:
SELECT DUMP(the_column)
FROM xxx
WHERE xxx
UPDATE: I think that before try to replace, look for the root of the problem. If this happens because a charset trouble you can get big problems with bad data.
UPDATE 2: Answering the comments. The problem may be is not on the database server side, may be is in the client side. The problem (if this is the problem) can be a translation on server to/from client comunication. It's for a server-client bad configuracion-coordination. For instance if the server has defined UTF8 charset and your client uses US7ASCII, then all acutes will appear as ?.
Another approach can be that if the server has defined UTF8 charset and your client also UTF8 but the application is not able to show UTF8 chars, then the problem is in the application side.
UPDATE 3: On your examples:
select translate('What. It works because the � is exactly the same char: You have pasted on both sides.
select translate(Fieldname. It does not work because the � is not stored on database, it's the char that the client receives may be because some translation occurs from the data table until it's showed to you.
Next step: Look in DUMP syntax and try to extract the codes for the mysterious char (from the table not pasting �!).
I would say there's a good chance the character is a single-tick "smart quote" (I hate the name). The smart quotes are characters 91-94 (using a Windows encoding), or Unicode U+2018, U+2019, U+201C, and U+201D.
I'm going to propose a front-end application-based, client-side approach to the problem:
I suspect that this problem has more to do with a mismatch between the font you are trying to display the word spouse�s with, and the character �. That icon appears when you are trying to display a character in a Unicode font that doesn't have the glyph for the character's code.
The Oracle database will dutifully return whatever characters were INSERTed into its' column. It's more up to you, and your application, to interpret what it will look like given the font you are trying to display your data with in your application, so I suggest investigating as to what this mysterious � character is that is replacing your apostrophes. Start by using FerranB's recommended DUMP().
Try running the following query to get the character code:
SELECT DUMP(<column with weird character>, 1016)
FROM <your table>
WHERE <column with weird character> like '%spouse%';
If that doesn't grab your actual text from the database, you'll need to modify the WHERE clause to actually grab the offending column.
Once you've found the code for the character, you could just replace the character by using the regex_replace() built-in function by determining the raw hex code of the character and then supplying the ASCII / C0 Controls and Basic Latin character 0x0027 ('), using code similar to this:
UPDATE <table>
set <column with offending character>
= REGEX_REPLACE(<column with offending character>,
"<character code of �>",
"'")
WHERE regex_like(<column with offending character>,"<character code of �>");
If you aren't familiar with Unicode and different ways of character encoding, I recommend reading Joel's article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). I wasn't until I read that article.
EDIT: If your'e seeing 0x92, there's likely a charset mismatch here:
0x92 in CP-1252 (default Windows code page) is a backquote character, which looks kinda like an apostrophe. This code isn't a valid ASCII character, and it isn't valid in IS0-8859-1 either. So probably either the database is in CP-1252 encoding (don't find that likely), or a database connection which spoke CP-1252 inserted it, or somehow the apostrophe got converted to 0x92. The database is returning values that are valid in CP-1252 (or some other charset where 0x92 is valid), but your db client connection isn't expecting CP-1252. Hence, the wierd question mark.
And FerranB is likely right. I would talk with your DBA or some other admin about this to get the issue straightened out. If you can't, I would try either doing the update above (seems like you can't), or doing this:
INSERT (<normal table columns>,...,<column with offending character>) INTO <table>
SELECT <all normal columns>, REGEX_REPLACE(<column with offending character>,
"\0092",
"\0027") -- for ASCII/ISO-8859-1 apostrophe
FROM <table>
WHERE regex_like(<column with offending character>,"\0092");
DELETE FROM <table> WHERE regex_like(<column with offending character>,"\0092");
Before you do this you need to understand what actually happened. It looks to me that someone inserted non-ascii strings in the database. For example Unicode or UTF-8. Before you fix this, be very sure that this is actually a bug. The apostrophe comes in many forms, not just the "'".
TRANSLATE() is a useful function for replacing or eliminating known single character codes.