Importing large CSV file into Postgres error unquoted carriage - sql

I am trying to import a large csv file using COPY, but I keep getting this error code.
ERROR: unquoted carriage return found in data
HINT: Use quoted CSV field to represent carriage return.
CONTEXT: COPY nyc_yellow_taxi_trips_2018_01, line 2
SQL state: 22P04
I know it is due to the blank row right under the header, but I tried manually deleting the space by opening through TextEdit. I also tried opening through excel, the file is too big to edit, but after deleting the space through TextEdit, there was no blank space.
I am still getting this error. Most likely an easy fix but I have been on this for awhile now.
Here is my code:
COPY nyc_yellow_taxi_trips_2018_01
FROM '/Users/eddy/taxi/yellow_tripdata_2018-01.csv'
WITH (FORMAT CSV, header, Delimiter ',' );

It looks like you have inconsistent line endings. It has found a carriage return, but it expected (based on what ended the header line) either just a newline, or a CRNL.
You need to make the line endings consistent, which I don't know how to do using TextEdit.

Related

How to clear txt file having different Delimiter using SSIS package?

I have text file which is having ^(CAP) and ,(Comma) as a delimiter and after clearing i need to load to sql . I have tried my best to clear a source file
But still file is not cleaned as expectation .
Please find the below picture i have tried to correct the source file
But still file is not cleared as expected . Please find below uncleared file .
You have a variety of issues here.
You have identified the header row delimiter as a comma. A row delimiter is the, usually invisible, delimiter than indicates a row's worth of data has happened. Traditionally, this is an Operating System specific value but it's a Carriage Return (CR), Line Feed (LF) or Carriage Return/Line Feed.
Your source data is not a comma delimited file with caret/circumflex/cap text delimiters. You have a comma-space delimited file which SSIS doesn't support in the editor. However, if you hand edit the dtsx file as I outlined in How to read a flatfile with lowercase thorn as the delimiter to specify that it should use comma space ColumnDelimiter="_x002C__x0020_"
Given a truncated version of your source data
ListCode, CAS, Name
^216^, ^^, ^Coal Dust^
^216^, ^7782-24-5^, ^Graphite (Natural)^
^216^, ^^, ^Inert or Nuisance Dust^
and the comma (0x2C) space (0x20) edited into the raw dtsx connection manager, I was able to pull data as I believe you are expecting
You might also run into additional issues given your selection of code pages and not checking the Unicode button but that's beyond my ability to generate matching source data from an image.
Just replace the ^, ^ with ^,^
It looks like your source
CAS, SubName, ListCode, Type, CountryCode, ListName
^1000413-72-8^,^fasiglifam^,^447^,^Chemical Inventory^,^EU^,^ECICS Custom Tariff Codes^
^1000413-72-8^,^fasiglifam^,^0^,^^,^NN^,^SPHERA Global Substance List^
Then edit your connection manager with below details
[![enter image description here][2]][2]
It will work .
[2]: https://i.stack.imgur.com/0x89k.png

Combine SQL files with command `copy` in a batch file introduce an incorrect syntaxe because it does add an invisible character `U+FEFF`

In a pre-build event, a batch file is executed to combine multiple SQL files into a single one.
It is done using this command :
COPY %#ProjectDir%\Migrations\*.sql %#ProjectDir%ContinuousDeployment\AllFilesMergedTogether.sql
Everything appear to work fine but somehow the result give an incorrect syntaxe error.
After two hours of investigation, it turn out the issue is caused by an invisible character that remain invisible even with notepad++.
Using an online website, the character has been spotted and is U+FEFF has shown in following image.
Here are the two input scripts.
PRINT 'Script1'
PRINT 'Script2'
Here is the output given by the copy command.
PRINT 'Script1'
PRINT 'Script2'
Additional info :
Batch file is encoded with UTF-8
Input files are encoded with UTF-8-BOM
Output file is encoded with UTF-8-BOM.
I'm not sure it is possible to change the encoding output of command copy.
I've tried and failed.
What should be done to eradicate this extremely frustrating parasitic character?
It has turned out that changing encoding of input files to ANSI does fix the issue.
No more pesky character(s).
Also, doing so does change the encoding of the result file to UTF-8 instead of UTF-8-BOM which is great I believe.
Encoding can be changed using Notepad++ as show in following picture.

Line contains invalid enclosed character data or delimiter at position

I was trying to load the data from the csv file into the Oracle sql developer, when inserting the data I encountered the error which says:
Line contains invalid enclosed character data or delimiter at position
I am not sure how to tackle this problem!
For Example:
INSERT INTO PROJECT_LIST (Project_Number, Name, Manager, Projects_M,
Project_Type, In_progress, at_deck, Start_Date, release_date, For_work, nbr,
List, Expenses) VALUES ('5770','"Program Cardinal
(Agile)','','','','','',to_date('', 'YYYY-MM-DD'),'','','','','');
The Error shown were:
--Insert failed for row 4
--Line contains invalid enclosed character data or delimiter at position 79.
--Row 4
I've had success when I've converted the csv file to excel by "save as", then changing the format to .xlsx. I then load in SQL developer the .xlsx version. I think the conversion forces some of the bad formatting out. It worked at least on my last 2 files.
I fixed it by using the concatenate function in my CSV file first and then uploaded it on sql, which worked.
My guess is that it doesn't like to_date('', 'YYYY-MM-DD'). It's missing a date to format. Is that an actual input of your data?
But it could also possibly be the double quote in "Program Cardinal (Agile). Though I don't see why that would get picked up as an invalid character.

Removing handling newlines in a simple text import class

I have an input file that I want to use the string SPLIT function on for each line, depending on the Type field. However, the description field sometimes has data that has new lines in it so it messes up my file reader since it uses streamreader's readline() function
Handled:
Type|Name|User|Description
Type|Name|User|Description
Unhandled:
Type|Name|User|Description line 1
Description Line 2
Type|Name|User|Description
Besides not being able to validate on 'Type' for each line and keep reading the file for when the next Type field appears, are there any ways folks can come up with to properly read this file?
My solution was to have the file maker replace newline characters in their description field with another unique character that I can later add back in. I'm still interested in solutions from the file reader's perspective though
I know I'm talking to myself a lot here, but I found another solution, which is to remove remove line feeds, since the output file creator wrote out carriage returns for each line.
You could easily set a conditional statement to see if the Split array contains more than one element, which would indicate that it's a line you want to parse.

CSV Carriage Return Character

I have a CSV output on one of my applications. This produces a file from of web form data.
In some cases I am getting a carriage return character in my notes field. This causes an error when importing the file. I would like to remove this character.
The issue appears to be happening when users paste information into the form from word documents or holding down the shift key and pressing enter.
The field is ntext and populated in a multi line text box control.
I have been trying to remove this with a replace function but some carriage return characters seem to be getting through.
SQL
REPLACE(Fieldname), CHAR(13) + CHAR(10), ' ') AS new_Fieldname
It may be best to replace the characters separately, as they do not always occur together or in that order:
REPLACE(REPLACE(Fieldname, CHAR(13),' '), CHAR(10), ' ') AS new_Fieldname
Note that you may have a carriage return + line feed, or just a carriage return (depending on the source platform, the source of the data etc.). So you will probably need to handle both cases.
You can read CSVs with carriage return in them. The carriage return should be in a string represented field (i.e. surrounded by quotes). This allows you to read lines and incldue them in your field. If you are reading your CSV one line at a time, you need to maintain state between lines and append the data as necessary.
In .Net, the easiest way to read a CSV is using the Microsoft.VisualBasic.FileIO.textFileParser object (yes, you can use this in C# if you add a reference). This reads even the nastiest CSVs I've thrown at it with ease.
In Word, there are different kinds of new-line characters. Maybe you should also search/replace the other ones.
I'm not sure which are all the different possibilities, at least the paragraph mark is one that I know of.