DB2 to SQL LinkedServer OpenQuery NonAscii Character Issue - sql

So I've been scouring SO for answer and I've seen some great SQL functions to help try and remove non-ascii characters from my db, but I wanted to post the entire question / process here first to see if maybe upstream on my select from db2 into sql there is a fix.
What I'm doing: Getting data from a db2 database into SQL
Issue: Non-ascii characters causing problems
Process: It's pretty simple. I have a SQL Insert statement to select a bunch of columns from a db2 linkedserver using open query
insert into [table](stuff) select (stuff) From Openquery(SSF400,'select stuff from table')
However, in my SQL db, when editing the landed table, I'm getting weird trailing characters that appear as a space in a sql select statement, but are actually artifacts in SQL Edit mode:
I've tried using a few functions I found here on SO to strip these characters, but after these function(s) I'm leftover with a combination of greek/english characters similar to the below:
I'm thinking there must be a better way for me to do the initial insert other than using openquery so that the junk characters don't come over. I know SQL pretty well, but DB2 not so much...any advice?
Update: There does seem to be a junk character or two in the source system. Discovered using iNavigator. Also, source system is using db2 v7r3m0
Update here is a screenshot of the regexp expression mentioned in the comments used in a query in iNavigator. Although several characters were removed, some do remain. The original column is on the left, the cleansed column is on the right.
Cheers,
MD

I would try REGEXP_REPLACE(stuff,'[^\u0020-\u007E\u0009\u000A\u000D]+','') which will remove everything that is not a character from the 7-bit ASCII set but also removes any 7-bit ASCII control characters apart from Tab, New Line and Carriage Return. It also removes DEL

Related

RODBC Error "Some part of your SQL statement is nested too deeply"

I have come across an error several times when working with R, that the RODBC package can't execute an SQL query string, but when I type the exact same string directly to a SQL Server query it works. Note that my strings contained umlauts.
I'm answering this question myself to help others avoid long internet searches if instead it can be simply reduced to this.
Almost always it was just an UNICODE error. Using umlauts or other non-unicode symbols in a R-string with the RODBC package produces this kind of error. So before trying to break it up into sub-queries as suggested by the error statement, check if your string contains only unicode characters.
If not, then the query is really to complex and needs to be split into sub-queries. For this, please refer to the other questions about this topic.

Removing special characters in sql

I am new to SQL and just know basic insert, update, and delete syntax.
I have an excel file, that I imported into the SQL server, but somehow, it brought in weird symbols and characters.
When I checked the excel file ,cleared all formatting, and re-uploaded, it would still show up, not sure how to clean it up.
Is there an easy replace syntax that you can suggest for me to use to do a global cleanup?
Sample values inside the columns are:
Lisa Hettinger  Cherry Creek Prop
Lisa J Hernandez
I would need to remove the weird ┬ and á characters.
If all you have are the " " characters, you can try using the REPLACE command like this:
SELECT REPLACE(N'Lisa J Hernandez', N' ', N' ')
I suspect the source code page has " " as its space character.
Update: To update the values in the entire column, you can use
UPDATE [MyTableName] SET [MyColName] = REPLACE([MyColName], N' ', N' ');

sql server conditional replace of csv data

I have a csv file that I am trying to import using BULK INSERT. The problem is that there is a field in the file that will be quoted (with double quotes) if a comma exists within the text (not quoted if no comma exists). The existence of the extra comma is causing SQL Server to throw errors because of an incorrect number of columns during the insert.
Here is a sample data set:
928 Riata Dr,Magnolia,TX,77354,4/15/2014
22 Roberts Ave.,McKinney,TX,75069,4/15/2014
"5531 Trinity Place, #22",San Antonio,TX,78212,4/15/2014
As you can see, the third row contains a comma within the address field, thus the address field is quoted. Since the BULK INSERT command is throwing errors because of this, I'm assuming I will need to scrub the file contents before attempting to load it.
Unless someone has a better solution
To scrub the file contents I will need to open the file (with SQL), read in the contents, and do a conditional replacement of the internal comma (found within the quotes). Since that comma doesn't really need to exist, I can just replace it with '' (blank).
Then, I can handle the quotes separately after the data gets imported with an update statement to replace any other characters I don't want.
I think the logic is sound, the problem is the syntax. I can't seem to find any syntax related to REGEX in SQL Server (Booo Microsoft). Which means I would need some other way to determine if the comma appears within quotes, and replace it if so.
Any thoughts, Suggestions, Code, etc.?
Thanks in advance.
This sounds too simple on the face of it, but if you can just replace the commas, can you open the csv in, say, Excel or OpenOffice Calc, and then do a find replace (commas with nothing)? I just tried with a csv of mine and it worked fine. The csv remains properly delimited.
Maybe I am missing something that prevents this, such as Excel opening this with extra cells due to the comma, in which case my answer is stupid. But it would make more sense to handle this in a spreadsheet app rather than after opening with SQL.
You may have to try delimiting with something other than commas, such as tabs or etc. I've had to do this with SQL imports before. In many cases you can save as a tab delimited txt file and upload to SQL.
Note that using Excel for this type of thing can be its own problem. For help with Excel and tab delimited SQL imports, see my answer here.

SQL Left/Deliminated Character

Pretty simple one today. I've got a column, let's call it title, with a bunch of project titles. What I need to to pull everything from the left of the ":" and do a left/right trim (I'm then going to be using that in a join later on but I just need a column with the new data for now). So here's an example of what the current column looks like:
And here's what I need it to look like after the query is run:
The problem is while the # are 6 characters now, I can't guarantee they'll always be 6 characters. So if I was doing this in Excel I'd use the deliminated feature or just write a left/len/search function. Wondering how to do the same in SQL. BTW, I'm using SQL Server Management Studio.
Thoughts?
Assuming that your number is always followed by a [space]:[space], then simply look for that first space, and use its location as the argument for a left-substring operation:
SELECT LEFT(Title, CHARINDEX(' ', Title, 0)) AS "New Title"
p.s. Just say you're using MS SQL Server. SSMS is just a management front-end for that database.
check this post out. it does exactly what you are trying to do.
SQL Server replace, remove all after certain character

Issues with Chr(0) in SQL INSERT script

We currently use the SQL Publishing Wizard to back up our database schemas and data, however we have some database tables with hashed passwords that contain the null character (chr(0)). When SQL Publishing Wizard generates the insert data scripts, the null character causes errors when we try and run the resulting SQL - it appears to ignore ALL TEXT after the first instance of this character in a script. We recently tried out RedGate SQL Compare, and found that it has the same issue with this character. I have confirmed it is ascii character code 0 by running the ascii() sql function against the offending record.
A sample of the error we are getting is:
Unclosed quotation mark after the character string '??`????{??0???
The fun part is, I can't really paste a sample Insert statement because of course everything that appears after the CHR(0) is being omitted when pasting!
Change the definition of the column to VARBINARY. The data you store in there doesn't seem to be an appropiate VARCHAR to start with.
This will ripple through the code that uses the column as you'll get a byte[] CLR tpe back in the client, and you should change your insert/update code accordingly. But after all, a passowrd hash is a byte[], not a string.