SQL: Copy text file to db with multiple character delimiter - sql

I have a file that I cannot edit that uses :: as the delimiter. I'm using the following sql formula to copy the contents of the file to a postgresql database:
cur.execute("COPY table_name (col_1,col_2,col_3,col_4) FROM 'file_path' WITH DELIMITER ':'")
When I execute this code the program throws DataError: extra data after last expected column. This appears to be happening because sql is inserting a blank column when it parses the colon character for the second time. SQL doesn't allow the use of multiple character delimiters so '::' is not an option.
I'm wondering if there's a best practice for this situation. Maybe some way to ignore the second colon?

Related

DB2 to SQL LinkedServer OpenQuery NonAscii Character Issue

So I've been scouring SO for answer and I've seen some great SQL functions to help try and remove non-ascii characters from my db, but I wanted to post the entire question / process here first to see if maybe upstream on my select from db2 into sql there is a fix.
What I'm doing: Getting data from a db2 database into SQL
Issue: Non-ascii characters causing problems
Process: It's pretty simple. I have a SQL Insert statement to select a bunch of columns from a db2 linkedserver using open query
insert into [table](stuff) select (stuff) From Openquery(SSF400,'select stuff from table')
However, in my SQL db, when editing the landed table, I'm getting weird trailing characters that appear as a space in a sql select statement, but are actually artifacts in SQL Edit mode:
I've tried using a few functions I found here on SO to strip these characters, but after these function(s) I'm leftover with a combination of greek/english characters similar to the below:
I'm thinking there must be a better way for me to do the initial insert other than using openquery so that the junk characters don't come over. I know SQL pretty well, but DB2 not so much...any advice?
Update: There does seem to be a junk character or two in the source system. Discovered using iNavigator. Also, source system is using db2 v7r3m0
Update here is a screenshot of the regexp expression mentioned in the comments used in a query in iNavigator. Although several characters were removed, some do remain. The original column is on the left, the cleansed column is on the right.
Cheers,
MD
I would try REGEXP_REPLACE(stuff,'[^\u0020-\u007E\u0009\u000A\u000D]+','') which will remove everything that is not a character from the 7-bit ASCII set but also removes any 7-bit ASCII control characters apart from Tab, New Line and Carriage Return. It also removes DEL

bcp query skip comma

The issue i am having is that i have data in sql columns which have "," in the data
Example column : Text,Text2,text3
so i want to skip the "," so it will not going to effect my data otherwise it just messing up the data
EXEC xp_cmdshell 'bcp "SELECT * from details queryout "\\TEst\Share\32\123Test.csv" /c /t, -T'
Since this file has quoted values and the quoted values are quoted on every record (not JUST the values that contain commas) you can use this solution.
Essentially you need to use a BCP format file with the -f option. With the format file you can specify custom field delimiters. So, in your case, the delimiters will be customized to no longer be just a comma (,) but will now include the quotes as well (",").
It gets a little tricky if the first or last fields are quoted, but that can be done too. See here:
SQL Server BCP Bulk insert Pipe delimited with text qualifier format file
or here:
SQL Server BCP Export where comma in SQL field

INSERT Statement in SQL Server Strips Characters, but using nchar(xxx) works - why?

I have to store some strange characters in my SQL Server DB which are used by an Epson Receipt Printer code page.
Using an INSERT statement, all are stored correctly except one - [SCI] (nchar(154)). I realise that this is a control character that isn't representable in a string, but the character is replaced by a '?' in the stored DB string, suggesting that it is being parsed (unsuccessfully) somewhere.
The collation of the database is LATIN1_GENERAL_CI_AS so it should be able to cope with it.
So, for example, if I run this INSERT:
INSERT INTO Table(col1) VALUES ('abc[SCI]123')
Where [SCI] is the character, a resulting SELECT query will return 'abc?123'.
However, if I use NCHAR(154), by directly inserting or by using a REPLACE command such as:
UPDATE Table SET col1 = REPLACE(col1, '?', NCHAR(154))
The character is stored correctly.
My question is, why? And how can I store it directly from an INSERT statement? The latter is preferable as I am writing from an existing application that produces the INSERT statement that I don't really want to have to change.
Thank you in advance for any information that may be useful.
When you write a literal string in SQL is is created as a VARCHAR unless you prefix is with N. This means if you include any Unicode characters, they will be removed. Instead write your INSERT statement like this:
INSERT INTO Table(col1) VALUES (N'abc[SCI]123')

sql server conditional replace of csv data

I have a csv file that I am trying to import using BULK INSERT. The problem is that there is a field in the file that will be quoted (with double quotes) if a comma exists within the text (not quoted if no comma exists). The existence of the extra comma is causing SQL Server to throw errors because of an incorrect number of columns during the insert.
Here is a sample data set:
928 Riata Dr,Magnolia,TX,77354,4/15/2014
22 Roberts Ave.,McKinney,TX,75069,4/15/2014
"5531 Trinity Place, #22",San Antonio,TX,78212,4/15/2014
As you can see, the third row contains a comma within the address field, thus the address field is quoted. Since the BULK INSERT command is throwing errors because of this, I'm assuming I will need to scrub the file contents before attempting to load it.
Unless someone has a better solution
To scrub the file contents I will need to open the file (with SQL), read in the contents, and do a conditional replacement of the internal comma (found within the quotes). Since that comma doesn't really need to exist, I can just replace it with '' (blank).
Then, I can handle the quotes separately after the data gets imported with an update statement to replace any other characters I don't want.
I think the logic is sound, the problem is the syntax. I can't seem to find any syntax related to REGEX in SQL Server (Booo Microsoft). Which means I would need some other way to determine if the comma appears within quotes, and replace it if so.
Any thoughts, Suggestions, Code, etc.?
Thanks in advance.
This sounds too simple on the face of it, but if you can just replace the commas, can you open the csv in, say, Excel or OpenOffice Calc, and then do a find replace (commas with nothing)? I just tried with a csv of mine and it worked fine. The csv remains properly delimited.
Maybe I am missing something that prevents this, such as Excel opening this with extra cells due to the comma, in which case my answer is stupid. But it would make more sense to handle this in a spreadsheet app rather than after opening with SQL.
You may have to try delimiting with something other than commas, such as tabs or etc. I've had to do this with SQL imports before. In many cases you can save as a tab delimited txt file and upload to SQL.
Note that using Excel for this type of thing can be its own problem. For help with Excel and tab delimited SQL imports, see my answer here.

Issues with Chr(0) in SQL INSERT script

We currently use the SQL Publishing Wizard to back up our database schemas and data, however we have some database tables with hashed passwords that contain the null character (chr(0)). When SQL Publishing Wizard generates the insert data scripts, the null character causes errors when we try and run the resulting SQL - it appears to ignore ALL TEXT after the first instance of this character in a script. We recently tried out RedGate SQL Compare, and found that it has the same issue with this character. I have confirmed it is ascii character code 0 by running the ascii() sql function against the offending record.
A sample of the error we are getting is:
Unclosed quotation mark after the character string '??`????{??0???
The fun part is, I can't really paste a sample Insert statement because of course everything that appears after the CHR(0) is being omitted when pasting!
Change the definition of the column to VARBINARY. The data you store in there doesn't seem to be an appropiate VARCHAR to start with.
This will ripple through the code that uses the column as you'll get a byte[] CLR tpe back in the client, and you should change your insert/update code accordingly. But after all, a passowrd hash is a byte[], not a string.