I have a csv file that I am trying to import using BULK INSERT. The problem is that there is a field in the file that will be quoted (with double quotes) if a comma exists within the text (not quoted if no comma exists). The existence of the extra comma is causing SQL Server to throw errors because of an incorrect number of columns during the insert.
Here is a sample data set:
928 Riata Dr,Magnolia,TX,77354,4/15/2014
22 Roberts Ave.,McKinney,TX,75069,4/15/2014
"5531 Trinity Place, #22",San Antonio,TX,78212,4/15/2014
As you can see, the third row contains a comma within the address field, thus the address field is quoted. Since the BULK INSERT command is throwing errors because of this, I'm assuming I will need to scrub the file contents before attempting to load it.
Unless someone has a better solution
To scrub the file contents I will need to open the file (with SQL), read in the contents, and do a conditional replacement of the internal comma (found within the quotes). Since that comma doesn't really need to exist, I can just replace it with '' (blank).
Then, I can handle the quotes separately after the data gets imported with an update statement to replace any other characters I don't want.
I think the logic is sound, the problem is the syntax. I can't seem to find any syntax related to REGEX in SQL Server (Booo Microsoft). Which means I would need some other way to determine if the comma appears within quotes, and replace it if so.
Any thoughts, Suggestions, Code, etc.?
Thanks in advance.
This sounds too simple on the face of it, but if you can just replace the commas, can you open the csv in, say, Excel or OpenOffice Calc, and then do a find replace (commas with nothing)? I just tried with a csv of mine and it worked fine. The csv remains properly delimited.
Maybe I am missing something that prevents this, such as Excel opening this with extra cells due to the comma, in which case my answer is stupid. But it would make more sense to handle this in a spreadsheet app rather than after opening with SQL.
You may have to try delimiting with something other than commas, such as tabs or etc. I've had to do this with SQL imports before. In many cases you can save as a tab delimited txt file and upload to SQL.
Note that using Excel for this type of thing can be its own problem. For help with Excel and tab delimited SQL imports, see my answer here.
Related
So I've been scouring SO for answer and I've seen some great SQL functions to help try and remove non-ascii characters from my db, but I wanted to post the entire question / process here first to see if maybe upstream on my select from db2 into sql there is a fix.
What I'm doing: Getting data from a db2 database into SQL
Issue: Non-ascii characters causing problems
Process: It's pretty simple. I have a SQL Insert statement to select a bunch of columns from a db2 linkedserver using open query
insert into [table](stuff) select (stuff) From Openquery(SSF400,'select stuff from table')
However, in my SQL db, when editing the landed table, I'm getting weird trailing characters that appear as a space in a sql select statement, but are actually artifacts in SQL Edit mode:
I've tried using a few functions I found here on SO to strip these characters, but after these function(s) I'm leftover with a combination of greek/english characters similar to the below:
I'm thinking there must be a better way for me to do the initial insert other than using openquery so that the junk characters don't come over. I know SQL pretty well, but DB2 not so much...any advice?
Update: There does seem to be a junk character or two in the source system. Discovered using iNavigator. Also, source system is using db2 v7r3m0
Update here is a screenshot of the regexp expression mentioned in the comments used in a query in iNavigator. Although several characters were removed, some do remain. The original column is on the left, the cleansed column is on the right.
Cheers,
MD
I would try REGEXP_REPLACE(stuff,'[^\u0020-\u007E\u0009\u000A\u000D]+','') which will remove everything that is not a character from the 7-bit ASCII set but also removes any 7-bit ASCII control characters apart from Tab, New Line and Carriage Return. It also removes DEL
I have created a variable that is type table inside a stored procedure. At the end of the procedure I am selecting all the rows in the table and displaying them. When I right click on the headers and select "Save As" it allows me to change the type to All Files and save the file as a text file. This works fine except that the columns that have NULLS in them saves as NULLS. I want it to fill NULLs in with spaces.
I've been trying to find a way to create a file using a stored procedure but most things indicate to use SSIS but I can't figure out how to use SSIS with a variable that is a table instead of using an actual table.
If I could either replace nulls with spaces or use a stored procedure to do the same thing it would be great. I can not use tab or comma delimited as the final product has to be a flat file that each column uses the same amount of characters as is declared in the column headers. Padded with spaces.
Thanks for any help you are able to offer.
Cheers
P.S. I am using SQL Server 2012 Management Studio
The easy way to do this would be to convert the NULLs to spaces in your SELECT statement.
SELECT COALESCE(yourcolumn, '')
Put the COALESCE clause around every column that has NULLs in it.
Using COALESCE article link
If the last thing you do in the stored procedure is Select * From TempTable then you can use that SP in an OleDb source component. Change from Table or View to Sql Command and use the Exec (sp_SomeName) syntax. This will create a pipe that you can connect to a destination component, such as flat file.
I have seen many issues over the years doing Save Results As... I will only use this for informal 'quick check' files and not for anything considered 'live' or 'production' data.
Here is a good blog that also shows how to use parameters.
http://geekswithblogs.net/stun/archive/2009/03/05/mapping-stored-procedure-parameters-in-ssis-ole-db-source-editor.aspx
I am new to SQL and just know basic insert, update, and delete syntax.
I have an excel file, that I imported into the SQL server, but somehow, it brought in weird symbols and characters.
When I checked the excel file ,cleared all formatting, and re-uploaded, it would still show up, not sure how to clean it up.
Is there an easy replace syntax that you can suggest for me to use to do a global cleanup?
Sample values inside the columns are:
Lisa Hettinger  Cherry Creek Prop
Lisa J Hernandez
I would need to remove the weird ┬ and á characters.
If all you have are the " " characters, you can try using the REPLACE command like this:
SELECT REPLACE(N'Lisa J Hernandez', N' ', N' ')
I suspect the source code page has " " as its space character.
Update: To update the values in the entire column, you can use
UPDATE [MyTableName] SET [MyColName] = REPLACE([MyColName], N' ', N' ');
I am writing an SQL statement into a regular Excel sheet (not VBA), which can be copied and pasted from a cell into SQL Server Management Studio.
In order for it to work, however, I need to declare variables (for example #userid). Usually, Excel has no problem with me using variables, but when I put them into an IF statement, it tells me that it's an 'invalid function'.
For instance, one part of my SQL query looks like this:
="VALUES("&IF(Keys!B5<>"", Keys!B5, "#spid")&", #sid, GETDATE())"
For now I've had to add quotes around the variables. Unfortunately this means that, when pasting into SQL Server Management Studio, you then have to remove the quotes around the variables again in order for them to be recognised as variables and not strings. This is inconvenient, as my spreadsheet is supposed to be designed to make adding records quicker and easier.
Is there any way I can work around this?
Here are a couple of possible work-arounds you might be able to put to use.
Use '#spid in Key!B5. The single tick used as the Range.PrefixCharacter property will not be added to the concatenated string.
Use spid in Key!B5 and put the # in the formula string parts. ="VALUES(#"&IF(Keys!B5<>"", Keys!B5, "spid")&", #sid, GETDATE())"
Same as 2. but format Key!B5 as with a custom number format mask of \## The \ is an escape character to make the first # literal. This will show the value as #spid when the cell actually contains spid.
Background: I have to create a report that will be run regularly to send to an external entity. It calls for a comma delimited text file. Certain fields required for the report contain commas (I can easily parse the commas out of the name fields, but errant commas in the address and certain number fields are trickier). I have no control over the database design or input controls.
I know how to get a comma-delimited text file of query results from SQL Server Management Studio. But the commas in the fields screw everything up. I can change the delimiting character and then get the fields right in Excel, but that's just a workaround - it needs to be able to meet specifications automatically.
This report previously ran on an antiquated DBMS - I have a copy of an old report, and the fields are all enclosed in double quotes ("...."). This would work - though I don't know how the external users parse the fields (not my problem) - but I'm too dumb to figure out how to do it in t-sql.
Thanks!
You can use the Export Data task, but if you must try getting these results from Management Studio after running a query, go to Tools>Options, find the settings for Grid Output and check the box to delimit fields that contain field separators. This option will only take effect when you open a new query window.