Removing special characters in sql - sql

I am new to SQL and just know basic insert, update, and delete syntax.
I have an excel file, that I imported into the SQL server, but somehow, it brought in weird symbols and characters.
When I checked the excel file ,cleared all formatting, and re-uploaded, it would still show up, not sure how to clean it up.
Is there an easy replace syntax that you can suggest for me to use to do a global cleanup?
Sample values inside the columns are:
Lisa Hettinger  Cherry Creek Prop
Lisa J Hernandez
I would need to remove the weird ┬ and á characters.

If all you have are the " " characters, you can try using the REPLACE command like this:
SELECT REPLACE(N'Lisa J Hernandez', N' ', N' ')
I suspect the source code page has " " as its space character.
Update: To update the values in the entire column, you can use
UPDATE [MyTableName] SET [MyColName] = REPLACE([MyColName], N' ', N' ');

Related

DB2 to SQL LinkedServer OpenQuery NonAscii Character Issue

So I've been scouring SO for answer and I've seen some great SQL functions to help try and remove non-ascii characters from my db, but I wanted to post the entire question / process here first to see if maybe upstream on my select from db2 into sql there is a fix.
What I'm doing: Getting data from a db2 database into SQL
Issue: Non-ascii characters causing problems
Process: It's pretty simple. I have a SQL Insert statement to select a bunch of columns from a db2 linkedserver using open query
insert into [table](stuff) select (stuff) From Openquery(SSF400,'select stuff from table')
However, in my SQL db, when editing the landed table, I'm getting weird trailing characters that appear as a space in a sql select statement, but are actually artifacts in SQL Edit mode:
I've tried using a few functions I found here on SO to strip these characters, but after these function(s) I'm leftover with a combination of greek/english characters similar to the below:
I'm thinking there must be a better way for me to do the initial insert other than using openquery so that the junk characters don't come over. I know SQL pretty well, but DB2 not so much...any advice?
Update: There does seem to be a junk character or two in the source system. Discovered using iNavigator. Also, source system is using db2 v7r3m0
Update here is a screenshot of the regexp expression mentioned in the comments used in a query in iNavigator. Although several characters were removed, some do remain. The original column is on the left, the cleansed column is on the right.
Cheers,
MD
I would try REGEXP_REPLACE(stuff,'[^\u0020-\u007E\u0009\u000A\u000D]+','') which will remove everything that is not a character from the 7-bit ASCII set but also removes any 7-bit ASCII control characters apart from Tab, New Line and Carriage Return. It also removes DEL

INSERT Statement in SQL Server Strips Characters, but using nchar(xxx) works - why?

I have to store some strange characters in my SQL Server DB which are used by an Epson Receipt Printer code page.
Using an INSERT statement, all are stored correctly except one - [SCI] (nchar(154)). I realise that this is a control character that isn't representable in a string, but the character is replaced by a '?' in the stored DB string, suggesting that it is being parsed (unsuccessfully) somewhere.
The collation of the database is LATIN1_GENERAL_CI_AS so it should be able to cope with it.
So, for example, if I run this INSERT:
INSERT INTO Table(col1) VALUES ('abc[SCI]123')
Where [SCI] is the character, a resulting SELECT query will return 'abc?123'.
However, if I use NCHAR(154), by directly inserting or by using a REPLACE command such as:
UPDATE Table SET col1 = REPLACE(col1, '?', NCHAR(154))
The character is stored correctly.
My question is, why? And how can I store it directly from an INSERT statement? The latter is preferable as I am writing from an existing application that produces the INSERT statement that I don't really want to have to change.
Thank you in advance for any information that may be useful.
When you write a literal string in SQL is is created as a VARCHAR unless you prefix is with N. This means if you include any Unicode characters, they will be removed. Instead write your INSERT statement like this:
INSERT INTO Table(col1) VALUES (N'abc[SCI]123')

How to delete a field or row in Spatialite GUI

I am trying to find an easy way to remove columns/fields from an existing QGIS Spatialite database file. I am new to both Spatialite GUI and SQL, but I want to get the said job done. I right-clicked on a layer (for-China) and chose 'Show columns' from the context menu. Then I got an error message:
SQL error: "near "-": syntax error"
so I tried executing the statement:
PRAGMA table_info('for-China');
alter table 'for-China'
delete row 'note';
and the table showed up, but the NOTE row wasn't deleted:
I tried using COLUMN instead of ROW and also tried using DROP instead of DELETE but NOTE is still left untouched. I am confused on what to do to delete the NOTE row.
I assume that spatialite uses the same escape characters as SQLite. Hence, try double quotes:
PRAGMA table_info("for-China");
alter table "for-China" drop column note;
You should only need this for identifiers that are keywords or use characters other than alpha numeric, underscore (and perhaps a few others).
SQLite also recognizes backticks and square braces, as explained in the documentation.

Can't update data in NVarChar column of a table in SSMS

I have a table in an SSMS database:
I am trying to update the contents of the "Name" column by removing the leading spaces from every entry. Following the question How to delete leading empty space in a SQL Database Table using MS SQL Server Managment Studio, I am therefore trying to run the following:
UPDATE ReferenceHierarchy set Name = LTRIM(Name)
The problem is that when I try to run it, it says "Name" is an invalid column. When I look at the code completion options for "Name", it sees the three fields ID, ParentID, and Sequence. Interestingly, these are the three non-NVarChar fields.
What could be the problem? And how can I fix it?
LTRIM only removes the leading NORMAL Spaces. In your case it may be the Tab space.
Try this for TAB SPACE
UPDATE ReferenceHierarchy set Name = LTRIM(REPLACE(Name,(CHAR(9)),''))
If your space character not a TAB Space and Normal space then it might by non Unicode character
Then Try this
UPDATE REFERENCEHIERARCHY SET NAME = TRIM(LTRIM(CASE WHEN NAME NOT LIKE '[A-ZA-Z0-9]%'
THEN STUFF(NAME, 1, 1, ' ')ELSE NAME END))
TT's comment was the key to solving this one. I actually think there is an interesting moral here: if things are not working the way you think they should, perhaps your enviornment is not what you think it is. What was actually happening was that I had in SSMS an old version of the same database in which the ReferenceHierarchy table did not have a "name" column. And without realizing it, I was running my query against that version. Running it against the correct version of the database solved my problem.

sql server conditional replace of csv data

I have a csv file that I am trying to import using BULK INSERT. The problem is that there is a field in the file that will be quoted (with double quotes) if a comma exists within the text (not quoted if no comma exists). The existence of the extra comma is causing SQL Server to throw errors because of an incorrect number of columns during the insert.
Here is a sample data set:
928 Riata Dr,Magnolia,TX,77354,4/15/2014
22 Roberts Ave.,McKinney,TX,75069,4/15/2014
"5531 Trinity Place, #22",San Antonio,TX,78212,4/15/2014
As you can see, the third row contains a comma within the address field, thus the address field is quoted. Since the BULK INSERT command is throwing errors because of this, I'm assuming I will need to scrub the file contents before attempting to load it.
Unless someone has a better solution
To scrub the file contents I will need to open the file (with SQL), read in the contents, and do a conditional replacement of the internal comma (found within the quotes). Since that comma doesn't really need to exist, I can just replace it with '' (blank).
Then, I can handle the quotes separately after the data gets imported with an update statement to replace any other characters I don't want.
I think the logic is sound, the problem is the syntax. I can't seem to find any syntax related to REGEX in SQL Server (Booo Microsoft). Which means I would need some other way to determine if the comma appears within quotes, and replace it if so.
Any thoughts, Suggestions, Code, etc.?
Thanks in advance.
This sounds too simple on the face of it, but if you can just replace the commas, can you open the csv in, say, Excel or OpenOffice Calc, and then do a find replace (commas with nothing)? I just tried with a csv of mine and it worked fine. The csv remains properly delimited.
Maybe I am missing something that prevents this, such as Excel opening this with extra cells due to the comma, in which case my answer is stupid. But it would make more sense to handle this in a spreadsheet app rather than after opening with SQL.
You may have to try delimiting with something other than commas, such as tabs or etc. I've had to do this with SQL imports before. In many cases you can save as a tab delimited txt file and upload to SQL.
Note that using Excel for this type of thing can be its own problem. For help with Excel and tab delimited SQL imports, see my answer here.