There are some escape characters in the source files(JSON in Azure Blob) – /” and /n. When I am transforming the files through copy activity or dataflow, these characters are getting parsed and I am getting just the “ and new line in the sink file. While I need the /” to be parsed to “ I need the /n to come as a string only and not new line. Is there some way I can achieve this? The quote character and escape character options in Delimited text dataset is not helping as I need the quote character setting to be configured to no quote character.
This is how a sample JSON looks-
"Value" : "<xmlns:xsd="http://www.w3.org/2001/XMLSchema" \n some data \n some data>"
In copy-activity or dataflow the source looks like this in preview mode-
xmlns:xsd="http://www.w3.org/2001/XMLSchema"<br/some datasome data>
I need something like this-
<xmlns:xsd="http://www.w3.org/2001/XMLSchema" \n some data \n some data>
So for my scenario, the \n needs to preserved and not get parsed as new line. I have tried adding \\n in the source and that worked but sadly changing at the source side is not possible. Can I do something like this in Synapse itself?
You can consider dataflows derived column transformations to read all data as single column and replace \n as \\n first in source file. Once you update source file then use copy activity.
Or
You can consider have a code deployed in Azure functions which can update source file to replace \n as \\n first and then use copy activity to copy file.
I am running the following query:
select concat('{','"name"',':', chr(34), str,
chr(34), ', ','"type"',':','"string"','},') jsonl
from (select 'part_number' as str)
which results in:
and this is the expected results.
But when I save the results to a csv file,
the results look different.
The issue is with the extra double quotation mark that is surrounding each element.
Any idea what is causing this discrepancy.
btw, my local machine is running Windows 11.
This is a common escaping for quotes inside strings in csv files. Try opening this file in Excel or LibreOffice Calc. It should look as expected.
redshift unload command is replacing " by "".
example :
UNLOAD($$ select '"Jane"' as name $$)
TO 's3://s3-bucket/test_'
iam_role arn:aws:iam::xxxxxx:role/xxxxxx'
HEADER
CSV
DELIMITER ','
ALLOWOVERWRITE
The output looks like : ""Jane""
If I run the same command with select 'Jane' as name , the output shows without quote at all like Jane. But I need the output to be "Jane"
You are asking for the unloaded file to be in CSV format and CSV format says that if you want a double quote in your data you need to escape it with another double quote. See https://datatracker.ietf.org/doc/html/rfc4180
So Redshift is doing exactly as you requested. Now if you just want a comma delimited file then you don't want to use "CSV" as this will add all the necessary characters to make the file fully compliant with the CSV specification.
This choice will come down to what tool or tools are reading the file and if they expect an rfc compliant CSV or just a simple file where fields are separated by commas.
This is a gripe of mine - tools that say they read CSV but don't follow the spec. If you say CSV then follow the format. Or call what you read something different, like CDV - comma delimited values.
I am trying to import a large csv file using COPY, but I keep getting this error code.
ERROR: unquoted carriage return found in data
HINT: Use quoted CSV field to represent carriage return.
CONTEXT: COPY nyc_yellow_taxi_trips_2018_01, line 2
SQL state: 22P04
I know it is due to the blank row right under the header, but I tried manually deleting the space by opening through TextEdit. I also tried opening through excel, the file is too big to edit, but after deleting the space through TextEdit, there was no blank space.
I am still getting this error. Most likely an easy fix but I have been on this for awhile now.
Here is my code:
COPY nyc_yellow_taxi_trips_2018_01
FROM '/Users/eddy/taxi/yellow_tripdata_2018-01.csv'
WITH (FORMAT CSV, header, Delimiter ',' );
It looks like you have inconsistent line endings. It has found a carriage return, but it expected (based on what ended the header line) either just a newline, or a CRNL.
You need to make the line endings consistent, which I don't know how to do using TextEdit.
I have a fairly large .txt file ~9gb and I will like to load this txt file into postgres. The first row is the header, followed by all the data. If I postgres COPY the data directly, the header will cause an error that data type do not match with my postgres table, so I will need to remove it somehow.
Sample data:
ProjectId,MailId,MailCodeId,prospectid,listid,datemailed,amount,donated,zip,zip4,VectorMajor,VectorMinor,packageid,phase,databaseid,amount2
15,53568419,89734,219906,15,2011-05-11 00:00:00,0,0,90720,2915,NonProfit,POLICY,230,3,1,0
16,84141863,87936,164657,243,2011-03-10 00:00:00,0,0,48362,2523,NonProfit,POLICY,1507,5,1,0
16,81442028,86632,15181625,243,2011-01-19 00:00:00,0,0,11501,2115,NonProfit,POLICY,1508,2,1,0
While the COPY function for postgres has the "header" setting that can ignore the first row, it only works for csv files:
copy training from 'C:/testCSV.csv' DELIMITER ',' csv header;
when I try to run the code above on my txt file, it gets an error:
copy training from 'C:/testTXTFile.txt' DELIMITER ',' csv header
ERROR: unquoted newline found in data
HINT: Use quoted CSV field to represent newline.
I have tried adding "quote" and "escape" attributes but the command just won't seem to work for txt file:
copy training from 'C:/testTXTFile.txt' DELIMITER ',' csv header quote as E'"' escape as E'\\N';
ERROR: COPY escape must be a single one-byte character
Alternatively, I thought about running java or create a seperate stagging table to remove the first row...but these solutions are expansive and time consuming. I will need to load 9gb of data just to remove the first row of headers... are there other solutions out there to remove the first row of a txt file easily so that I can load the data into my postgres database?
Use HEADER option with CSV option:
\copy <table_name> from '/source_file.csv' delimiter ',' CSV HEADER ;
HEADER
Specifies that the file contains a header line with the names of each column in the file. On output, the first line contains the column names from the table, and on input, the first line is ignored. This option is allowed only when using CSV format.
I've looked up docs at https://www.postgresql.org/docs/10/sql-copy.html
written about HEADER is not only true for CSV, but TSV also!
My solution was this in psql
\COPY mytable FROM 'mydata.tsv' DELIMITER E'\t' CSV HEADER;
(in addition mydata.tsv contaned header row which I excluded from copying to database table)